Hi and welcome! I’ve been playing Pokemon since I was about 6 years old. The game has come a long way from the original 151 expanding to nearly 900 Pokemon! While some critics might say the quality of Pokemon designs has declined (Vanilluxe), it’s hard not to be impressed by the longevity of the franchise. I wanted to take some time to combine my childhood hobby with my current one.

Setup

First, let us take a look at the data we’re working with!

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
rm(list = ls())
library(tidyverse)
library(ggrepel)
library(png)
library(grid)
setwd("~/Desktop/code/R/Pokemon")
pokemon <- read_csv("Pokemon.csv")
glimpse(pokemon)
## Rows: 800
## Columns: 13
## $ `#`        <dbl> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14,…
## $ Name       <chr> "Bulbasaur", "Ivysaur", "Venusaur", "VenusaurMega Venusaur…
## $ `Type 1`   <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "Fire"…
## $ `Type 2`   <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flying", …
## $ Total      <dbl> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405, 530…
## $ HP         <dbl> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45, 50…
## $ Attack     <dbl> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103, 30…
## $ Defense    <dbl> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120, 35…
## $ `Sp. Atk`  <dbl> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 135, …
## $ `Sp. Def`  <dbl> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 115, 2…
## $ Speed      <dbl> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78, 45,…
## $ Generation <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Legendary  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…

There’s 800 rows of data encompassing the first six generations of Pokemon. The main series games are on the eight generation so this data is a little dated. If you compare the number of rows in the data with the “National Dex” numbering, you might be thinking “There’s only 721 Pokemon? How did we get 800 rows of data?”. This is because variant forms (such as “Mega” Evolutions) receive their own line of data. It looks like we have a column for most basic information we could want such as Name, Type (Primary and Secondary), Statistics, etc.

Data Cleaning & Feature Engineering

The data we have has plenty of information, but we can think of a few more columns to add from the columns we already have.

pokemon <- pokemon %>% 
  rename(SpAtk = `Sp. Atk`, SpDef = `Sp. Def`, Type1 = `Type 1`, Type2 = `Type 2`, PokedexNum = `#`)

pokemon <- pokemon %>% 
  mutate(AtkTotal = Attack + SpAtk,
         DefTotal = Defense + SpDef,
         isMega = grepl("Mega", Name, ignore.case = FALSE),
         isMultiType = !is.na(Type2),
         classification = if_else(isMega == TRUE, "Mega", 
                                  if_else(Legendary == TRUE, "Legendary", "Normal"))
         )

Some of the data types weren’t given the proper classification (e.g. character instead of factor). So, we can manually change them to what we want them. I’ll change my “dbl” columns to “int” (not a huge deal), and some of my character columns to “factors” (important for modelling).

factor_cols = c("Generation", "classification")
int_cols = c("PokedexNum", "Total", "HP", "Attack", "Defense", "SpAtk", "SpDef", "Speed")

pokemon[factor_cols] <- lapply(pokemon[factor_cols], factor)
pokemon[int_cols] <- lapply(pokemon[int_cols], as.integer)

glimpse(pokemon)
## Rows: 800
## Columns: 18
## $ PokedexNum     <int> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13,…
## $ Name           <chr> "Bulbasaur", "Ivysaur", "Venusaur", "VenusaurMega Venu…
## $ Type1          <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "F…
## $ Type2          <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flyin…
## $ Total          <int> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405,…
## $ HP             <int> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45…
## $ Attack         <int> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103…
## $ Defense        <int> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120…
## $ SpAtk          <int> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 1…
## $ SpDef          <int> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 11…
## $ Speed          <int> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78,…
## $ Generation     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Legendary      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
## $ AtkTotal       <dbl> 114, 142, 182, 222, 112, 144, 193, 260, 263, 98, 128, …
## $ DefTotal       <dbl> 114, 143, 183, 243, 93, 123, 163, 196, 193, 129, 160, …
## $ isMega         <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, …
## $ isMultiType    <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE…
## $ classification <fct> Normal, Normal, Normal, Mega, Normal, Normal, Normal, …

EDA (Type)

Now let’s start making some fun graphs!

totals <- pokemon %>% 
  group_by(Type1) %>% 
  summarise(count = n())
  
# Generation 1 Color Scheme
pokemon %>% 
  ggplot(aes(x = fct_infreq(Type1))) +
  geom_bar(fill = "#84ADD7", color = "#F2684A") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  labs(title = "Frequency of Primary Types", x = "Type", y = "Frequency") + 
  geom_text(aes(Type1, count + 5, label = count, fill = NULL), data = totals)

This is my first graph so I thought it was appropriate to give it a Gen 1 color scheme. That means Blue and Red as those were the first two games released in North America. Note: In Japan, it was Green and Red, thus why the rereleased versions of Gen 1 were FireRed and LeafGreen.

Water types seems to be by far the most common Primary type, with Flying being the least common with a staggering 4! A fun fact about Flying types is that until Generation 5 there were no pure/primary Flying types. The number really is 3, because one of them (Tornadus) has two forms. On that note, we can say our data going forward is a little skewed by overrepresenting Pokemon with multiple forms without stat changes. It’s not a huge deal but something to take note of. The second least common type is Fairy, but the Fairy type was only recently introduced (in Generation 6)! I think it’s fascinating how so many types hover right around the 27-32 range. I’d like to think Water being the most common is a nod to how the Earth is mostly water, but that might be a little [Farfetch’d](https://bulbapedia.bulbagarden.net/wiki/Farfetch%27d_(Pok%C3%A9mon%29)

totals <- pokemon %>% 
  filter(!is.na(Type2)) %>% 
  group_by(Type2) %>% 
  summarise(count = n())

# Genration 2 Color Scheme
pokemon %>% 
  filter(!is.na(Type2)) %>% 
  ggplot(aes(x = fct_infreq(Type2))) +
  geom_bar(fill = "#C8CFD7", color = "#feff6a") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  labs(title = "Frequency of Secondary Types", x = "Type", y = "Frequency") +
  geom_text(aes(Type2, count + 5, label = count, fill = NULL), data = totals)

Flying is the most common secondary type by a landslide. As previously mentioned, very few Pokemon have Flying as a primary type so this isn’t too shocking. I think it’s interesting to see Water and Normal shift to the back of pack here as well. Poison is commonly paired with Grass or Bug (two of the most common primary types). It’s interesting to see the creator’s choices between primary and secondary typing. For Pokemon with multiple types, I’m not sure what difference it makes which is “primary” and which is “secondary”.

Note 2: Second graph == Gen 2 Color Scheme (Gold & Silver)

type_combinations <- pokemon %>%
  mutate(Type2 = ifelse(is.na(Type2), "", Type2)) %>% 
  group_by(Type1, Type2) %>%
  summarise(count=n())

#Pikachu Color Scheme
type_combinations %>% 
  ggplot(aes(x=Type1,y=as.character(Type2))) + 
  geom_tile(aes(fill = count), show.legend = FALSE) +
  geom_text(aes(label=count)) +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  labs(x="Type 1", y="Type 2",
       title="Type Combinations") +   
  scale_fill_gradient(low="#f6bd20", high="#c52018") 

This visualizatoin gives us insight on which types match up. Some of these are less surprising than others. For example, Poison being combined with Bug and Grass isn’t hard to make sense of whereas Electric/Water might seem a little more contradictory. I think one interesting takeaway is that despite over 700 distinct Pokemon, only about half of the possible combinations have been explored! Imagine what a Ghost/Rock Pokemon could look like! Flying and Fairy obviously have the most ground to make up. There are 39 type combinations with exactly one Pokemon (not reversed i.e. Rock/Bug is different from Bug/Rock).

# Bug, Dark, Dragon, Electric, Fairy, Fighting, Fire, Flying, Ghost, Grass, Ground, Ice, Normal, Poison, Psychic, Rock, Steel, Water
type_colors = c("#A8B820", "#705848", "#7038F8", "#F8D030", "#EE99AC", "#C03028","#F08030","#A890F0",
                "#705898", "#78C850", "#E0C068", "#98D8D8","#A8A878", "#A040A0", "#F85888", "#B8A038",
                "#B8B8D0", "#6890F0")

type_colors_outline = c("#C6D16E", "#49392F", "#4924A1", "#A1871F", "#9B6470", "#7D1F1A", "#9C531F",
                        "#6D5E9C", "#493963", "#4E8234", "#927D44", "#638D8D", "#6D6D4E", "#682A68",
                        "#A13959", "#786824", "#787887", "#445E9C")

pokemon %>% 
  ggplot(aes(x = Type1, y = Total, fill = Type1, color = Type1)) +
  geom_boxplot(show.legend = FALSE) +
  labs(title = "Stats by Primary Type") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none")

Dragon seems to be the strongest type. This makes sense as Dragon Pokemon are typically rare and “pseudo legendary”. Most types have a median value between 400-450, while Dragon is right around 600 (for reference, it’s common for Legendary to have a total stat count of 600). The Flying type is skewed up by it’s small sample size consisting of one legendary and one pseudo-legendary evolutionary line. Psychic has the greatest variation having Pokemon ranging the whole spectrum! The big extremes are affecting by it containing some of the strongest Pokemon (Mewtwo/Deoxys/Hoopa) and weakest (Spoink/Abra).

pokemon %>% 
  filter(Legendary == TRUE) %>% 
  ggplot(aes(x=Type1, fill = Type1, color = Type1)) +
  geom_bar(show.legend = FALSE) + 
  scale_fill_manual(values = type_colors[-c(1,6, 14)],
                    guide = "none") +
  scale_color_manual(values = type_colors_outline[-c(1,6, 14)],
                    guide = "none") +
  labs(title = "Primary Type of Legendary Pokemon")

This is an emphasis on Psychic and Dragon containing some of the most powerful Pokemon in existence. Notably some types are missing! These types are: Bug, Fighting, and Poison. To me, it’s a little surprising that these types haven’t received a single Legendary Pokemon in six generation, but there are only a few introduced each generation. I’m sure from a conceptual and marketing standpoint, it’s not easy to make a Pokemon centered around one of these types.

# Latios/Latias Color Scheme
pokemon %>% 
  ggplot(aes(fill = Legendary, x=Type1)) +
  geom_bar(position="stack") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) + 
  scale_fill_manual(values = c("#cd696e", "#7db5da")) +
  labs(title = "Legendary Pokemon by Primary Type", x = "Primary Type", y = "Frequency")

Notice the lack of blue in three certain columns. This also shows the scarcity of Legendary Pokemon.

# Xerneas/Yveltal Color Scheme
pokemon %>% 
  ggplot(aes(fill = isMega, x=Type1)) +
  geom_bar(position="stack") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) + 
  scale_fill_manual(values = c("#e9351c", "#6275b9")) +
  labs(title = "Mega Pokemon by Primary Type", x = "Primary Type", y = "Frequency")

Mega Pokemon are even more rare! They were only recently introduced in Generation 6 alongside the new Fairy type. When Mega Pokemon were first introduced only Pokemon from Generation 1 were granted Mega evolutions, but this exclusivity has since expanded. Mega Pokemon inserted an amazing new aspect to competitive Pokemon as many of them basically had the stats of a Legendary Pokemon (sometimes not allowed in the OU tier). Some Mega Pokemon were deemed “overpowered” and relegated to the “Ubers” tier. Mega Blaziken and Mega Gengar were two that I rememeber being banned relatively quick.

EDA (Stats)

We’ve covered a lot of ground in regards to typing. Let us move on to inspecting the different splits of Pokemon stats.

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

pokemon %>% 
  ggplot(aes(x = Type1, y = HP, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "HP by Primary Type")

pokemon %>% 
  filter(is_outlier(HP) == TRUE) %>% 
  mutate(HPPercent = round(HP / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, HP, HPPercent)
## # A tibble: 19 x 7
##    PokedexNum Name                  Type1    Type2  Total    HP HPPercent
##         <int> <chr>                 <chr>    <chr>  <int> <int>     <dbl>
##  1         40 Wigglytuff            Normal   Fairy    435   140      0.32
##  2        113 Chansey               Normal   <NA>     450   250      0.56
##  3        131 Lapras                Water    Ice      535   130      0.24
##  4        134 Vaporeon              Water    <NA>     525   130      0.25
##  5        143 Snorlax               Normal   <NA>     540   160      0.3 
##  6        202 Wobbuffet             Psychic  <NA>     405   190      0.47
##  7        242 Blissey               Normal   <NA>     540   255      0.47
##  8        289 Slaking               Normal   <NA>     670   150      0.22
##  9        292 Shedinja              Bug      Ghost    236     1      0   
## 10        297 Hariyama              Fighting <NA>     474   144      0.3 
## 11        320 Wailmer               Water    <NA>     400   130      0.32
## 12        321 Wailord               Water    <NA>     500   170      0.34
## 13        426 Drifblim              Ghost    Flying   498   150      0.3 
## 14        446 Munchlax              Normal   <NA>     390   135      0.35
## 15        487 GiratinaAltered Forme Ghost    Dragon   680   150      0.22
## 16        487 GiratinaOrigin Forme  Ghost    Dragon   680   150      0.22
## 17        594 Alomomola             Water    <NA>     470   165      0.35
## 18        716 Xerneas               Fairy    <NA>     680   126      0.19
## 19        717 Yveltal               Dark     Flying   680   126      0.19

HP seems to be pretty consistent across all types. The two big outliers for the Normal type are “Blissey” and “Chansey”. We’ll come back to them later. The Psychic outlier is Wobbuffet and is interesting because his/her HP stat accounts for nearly 50% of it’s total stats (similar to Chansey & Blissey). Chansey leads the pack in this regard with a 56% of her total stats are attributed to her HP (Chansey & Blissey has a 100% female rate). The Bug Pokemon that looks like it has no HP is Shuckle! Shuckle’s appeal (as we’ll see later as well) is his/her staggering Defensive stats.

pokemon %>% 
  ggplot(aes(x = Type1, y = Attack, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "ATk by Primary Type")

pokemon %>% 
  filter(is_outlier(Attack) == TRUE) %>% 
  mutate(AtkPercent = round(Attack / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Attack, AtkPercent)
## # A tibble: 7 x 7
##   PokedexNum Name                    Type1   Type2    Total Attack AtkPercent
##        <int> <chr>                   <chr>   <chr>    <int>  <int>      <dbl>
## 1        150 MewtwoMega Mewtwo X     Psychic Fighting   780    190       0.24
## 2        214 HeracrossMega Heracross Bug     Fighting   600    185       0.31
## 3        383 GroudonPrimal Groudon   Ground  Fire       770    180       0.23
## 4        384 RayquazaMega Rayquaza   Dragon  Flying     780    180       0.23
## 5        386 DeoxysAttack Forme      Psychic <NA>       600    180       0.3 
## 6        445 GarchompMega Garchomp   Dragon  Ground     700    170       0.24
## 7        646 KyuremBlack Kyurem      Dragon  Ice        700    170       0.24

We see a number of outliers here. The two Psychic types that are off the charts are Mega Mewtwo X and Deoxys - Attack Form, two Pokemon that were essentially made to be overpowered in the Attack stat. Mega Heracross is the Bug outlier and shows the absurdity of Mega evolutions. Heracross is not a stellar Pokemon by most measures, yet it’s Mega evolution’s Attack stat rivals the highest in the game. The Normal lower outlier is Chansey with a whopping Attack stat of 5. We see more variation between types in the Attack stat than we did the HP stat. We see Fairy and Psychic with below average Attack stats because they are known for their SpAtk. This fits inline with their Pokemon typing and appearances. Fighting has an above average Attack stat. Again in line with the design of most Fighting types’ bulky, muscular appearance.

pokemon %>% 
  ggplot(aes(x = Type1, y = Defense, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "DEF by Primary Type")

pokemon %>% 
  filter(is_outlier(Defense) == TRUE) %>% 
  mutate(DefPercent = round(Defense / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Defense, DefPercent)
## # A tibble: 13 x 7
##    PokedexNum Name                  Type1   Type2   Total Defense DefPercent
##         <int> <chr>                 <chr>   <chr>   <int>   <int>      <dbl>
##  1         80 SlowbroMega Slowbro   Water   Psychic   590     180       0.31
##  2         91 Cloyster              Water   Ice       525     180       0.34
##  3         95 Onix                  Rock    Ground    385     160       0.42
##  4        208 Steelix               Steel   Ground    510     200       0.39
##  5        208 SteelixMega Steelix   Steel   Ground    610     230       0.38
##  6        213 Shuckle               Bug     Rock      505     230       0.46
##  7        306 Aggron                Steel   Rock      530     180       0.34
##  8        306 AggronMega Aggron     Steel   <NA>      630     230       0.37
##  9        377 Regirock              Rock    <NA>      580     200       0.34
## 10        383 GroudonPrimal Groudon Ground  Fire      770     160       0.21
## 11        386 DeoxysDefense Forme   Psychic <NA>      600     160       0.27
## 12        411 Bastiodon             Rock    Steel     495     168       0.34
## 13        713 Avalugg               Ice     <NA>      514     184       0.36

We continue to see this pattern of types’ appearance and their stats. Steel and Rock (and to a lesser extend, Ground) are usually made out of hard material and this is reflected in the defense stat. Shuckle is an exception to the Bug rule as bug are typically small creatures, Shuckle had a rock hard shell and thus his secondary typing. In fact, many of these are either depicted with some hard material (e.g. steel/rock) or a shell.

pokemon %>% 
  ggplot(aes(x = Type1, y = SpAtk, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "SpAtk by Primary Type")

pokemon %>% 
  filter(is_outlier(SpAtk) == TRUE) %>% 
  mutate(SpAtkPercent = round(SpAtk / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, SpAtk, SpAtkPercent)
## # A tibble: 10 x 7
##    PokedexNum Name                    Type1    Type2  Total SpAtk SpAtkPercent
##         <int> <chr>                   <chr>    <chr>  <int> <int>        <dbl>
##  1         65 AlakazamMega Alakazam   Psychic  <NA>     590   175         0.3 
##  2         94 GengarMega Gengar       Ghost    Poison   600   170         0.28
##  3        150 MewtwoMega Mewtwo Y     Psychic  <NA>     780   194         0.25
##  4        181 AmpharosMega Ampharos   Electric Dragon   610   165         0.27
##  5        282 GardevoirMega Gardevoir Psychic  Fairy    618   165         0.27
##  6        382 KyogrePrimal Kyogre     Water    <NA>     770   180         0.23
##  7        384 RayquazaMega Rayquaza   Dragon   Flying   780   180         0.23
##  8        386 DeoxysAttack Forme      Psychic  <NA>     600   180         0.3 
##  9        646 KyuremWhite Kyurem      Dragon   Ice      700   170         0.24
## 10        720 HoopaHoopa Unbound      Psychic  Dark     680   170         0.25

We see more variance with the Special Attack stat than any other statistic. On top of the type-to-type variance, we see large variance between types especially within the Psychic, Dragon, Electric, and Water typings.

pokemon %>% 
  ggplot(aes(x = Type1, y = SpDef, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "SpDef by Primary Type")

pokemon %>% 
  filter(is_outlier(SpDef) == TRUE) %>% 
  mutate(SpDefPercent = round(SpDef / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, SpDef, SpDefPercent)
## # A tibble: 7 x 7
##   PokedexNum Name                Type1   Type2  Total SpDef SpDefPercent
##        <int> <chr>               <chr>   <chr>  <int> <int>        <dbl>
## 1        213 Shuckle             Bug     Rock     505   230         0.46
## 2        249 Lugia               Psychic Flying   680   154         0.23
## 3        250 Ho-oh               Fire    Flying   680   154         0.23
## 4        378 Regice              Ice     <NA>     580   200         0.34
## 5        382 KyogrePrimal Kyogre Water   <NA>     770   160         0.21
## 6        386 DeoxysDefense Forme Psychic <NA>     600   160         0.27
## 7        671 Florges             Fairy   <NA>     552   154         0.28

Special Defense shows less variance between types. Again, we see Psychic and Dragon types have higher than average Special Defense stats. Let’s remember that the reason we continue to see these two typing at the top of many of our charts is that many of the best Pokemon have these typings. Fairy and Electric also appear to be higher than average but not as significant.

pokemon %>% 
  ggplot(aes(x = Type1, y = Speed, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none")  + 
  labs(title = "Speed by Primary Type")

pokemon %>% 
  filter(is_outlier(Speed) == TRUE) %>% 
  mutate(SpeedPercent = round(Speed / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Speed, SpeedPercent)
## # A tibble: 2 x 7
##   PokedexNum Name              Type1   Type2  Total Speed SpeedPercent
##        <int> <chr>             <chr>   <chr>  <int> <int>        <dbl>
## 1        291 Ninjask           Bug     Flying   456   160         0.35
## 2        386 DeoxysSpeed Forme Psychic <NA>     600   180         0.3

Flying’s high boxplot can again be attributed to a small sample size, while Psychic’s wide variance can be attributed to the different type of Pokemon found in the typing. Electric and Dragon have high Speed, while Fairy is below average.

pokemon %>% 
  ggplot(aes(x = Type1, y = AtkTotal, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "Total Atk by Primary Type")

pokemon %>% 
  filter(is_outlier(AtkTotal) == TRUE) %>% 
  mutate(AtkPercent = round(AtkTotal / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, AtkTotal, AtkPercent)
## # A tibble: 16 x 7
##    PokedexNum Name                  Type1   Type2    Total AtkTotal AtkPercent
##         <int> <chr>                 <chr>   <chr>    <int>    <dbl>      <dbl>
##  1        150 MewtwoMega Mewtwo X   Psychic Fighting   780      344      0.44 
##  2        150 MewtwoMega Mewtwo Y   Psychic <NA>       780      344      0.44 
##  3        257 BlazikenMega Blaziken Fire    Fighting   630      290      0.46 
##  4        381 LatiosMega Latios     Dragon  Psychic    700      290      0.41 
##  5        382 KyogrePrimal Kyogre   Water   <NA>       770      330      0.43 
##  6        383 GroudonPrimal Groudon Ground  Fire       770      330      0.43 
##  7        384 Rayquaza              Dragon  Flying     680      300      0.44 
##  8        384 RayquazaMega Rayquaza Dragon  Flying     780      360      0.46 
##  9        386 DeoxysNormal Forme    Psychic <NA>       600      300      0.5  
## 10        386 DeoxysAttack Forme    Psychic <NA>       600      360      0.6  
## 11        445 GarchompMega Garchomp Dragon  Ground     700      290      0.41 
## 12        646 KyuremBlack Kyurem    Dragon  Ice        700      290      0.41 
## 13        646 KyuremWhite Kyurem    Dragon  Ice        700      290      0.41 
## 14        681 AegislashBlade Forme  Steel   Ghost      520      300      0.580
## 15        719 DiancieMega Diancie   Rock    Fairy      700      320      0.46 
## 16        720 HoopaHoopa Unbound    Psychic Dark       680      330      0.49

Here we examine the combined offensive prowness (Attack + SpAtk) to emphasize the powerhouse the Dragon type is. Pokemon with high Atk and SpAtk stats are typically known as “Mixed-Attackers”, while high Atk or SpAtk would be “Physical Sweepers” or “Special Sweepers”, respectively. We notice the Psychic type fall in line with the rest of the types when we measure Total Atk because they typically have low Atk stats.

pokemon %>% 
  ggplot(aes(x = Type1, y = DefTotal, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none")  + 
  labs(title = "Total Def by Primary Type")

pokemon %>% 
  filter(is_outlier(DefTotal) == TRUE) %>% 
  mutate(DefPercent = round(DefTotal / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, DefTotal, DefPercent)
## # A tibble: 12 x 7
##    PokedexNum Name                  Type1   Type2  Total DefTotal DefPercent
##         <int> <chr>                 <chr>   <chr>  <int>    <dbl>      <dbl>
##  1        208 SteelixMega Steelix   Steel   Ground   610      325      0.53 
##  2        213 Shuckle               Bug     Rock     505      460      0.91 
##  3        306 AggronMega Aggron     Steel   <NA>     630      310      0.49 
##  4        377 Regirock              Rock    <NA>     580      300      0.52 
##  5        378 Regice                Ice     <NA>     580      300      0.52 
##  6        379 Registeel             Steel   <NA>     580      300      0.52 
##  7        386 DeoxysDefense Forme   Psychic <NA>     600      320      0.53 
##  8        411 Bastiodon             Rock    Steel    495      306      0.62 
##  9        476 Probopass             Rock    Steel    525      295      0.56 
## 10        681 AegislashShield Forme Steel   Ghost    520      300      0.580
## 11        703 Carbink               Rock    Fairy    500      300      0.6  
## 12        719 Diancie               Rock    Fairy    600      300      0.5

Again, we see Steel, Rock, and Dragon rise to the top, but not as significant as before because Steel and Rock’s high Def stats are offset by their average SpDef stats.

# Legendary Birds Color Scheme
pokemon %>% 
  ggplot(aes(x = classification, y = Total, color = classification, fill = classification)) + 
  geom_boxplot(show.legend = FALSE) + 
  scale_fill_manual(values = c("#d50808", "#ffd541", "#94c5ff"),
                    guide = "none") +
  scale_color_manual(values = c("#ffc54a", "#9c7b10", "#005273"),
                    guide = "none") + 
  labs(title = "Total Stats by Classification")

We see how comparable the Legendary and Mega Pokemon are. The introduction of Mega Pokemon essentially introduced another evolution for fan favorite Pokemon to have near Legendary stats. Other overpowered formes were introduced such as Primal for Kyogre and Groundon. Also note the overlap between normal Pokemon and Legendary/Mega. This overlap is mostly due to pseudo-legendary Pokemon. These Pokemon are typically Dragon types found late in the game who have base stats that sum to 600, a very high total.

pokemon %>% 
 ggplot(aes(x=Total)) +
   geom_density(alpha=0.5, aes(fill=Type1)) +
   facet_wrap(~Type1) + 
   labs(x="Total", y="Density") +
  scale_fill_manual(values = type_colors,
                    guide = "none")

Most types have a peak or two, mostly explained by the average stat totals as Pokemon evolve. The Psychic appears to eb an exception as there does not appear to be a peak rather a smooth, uniform density. Steel, Fairy, Dark do not have as many evolutions as some of the other types so they are basically unimodal.

# Generation Mascot Color Scheme
pokemon %>% 
  ggplot(aes(x = Generation, y = Total, color = Generation, fill = Generation)) +
  geom_boxplot() + 
  scale_fill_manual(values = c("#2062ac", "#deac00", "#ff2029", "#205a94", "#181820", "#6275b9"),
                    guide = "none") +
  scale_color_manual(values = c("#F2684A", "#9cace6", "#313973", "#bd6ad5", "#bdbdd5", "#e9351c"),
                    guide = "none") + 
  labs(title = "Total Stats by Generation")

Interestingly enough, there does not appear to be a significant difference in the stat Totals between the 6 generations. Generation 4 has a slightly higher average and I believe this might be due to the introduction of many new evolutions for earlier Pokemon such as Magmortar, Electrivire, etc.

pokemon %>% 
  mutate(MaxAtk = ifelse(Attack > SpAtk, Attack, SpAtk)) %>% 
  filter(MaxAtk > 100) %>% 
  ggplot(aes(x = Speed, y = MaxAtk)) +
  geom_point(aes(color = Type1)) +
  geom_smooth(method = 'lm') +
  scale_color_manual(values = type_colors)  + 
  labs(title = "Offensive Potential (Speed vs. MaxAtk)")

pokemon %>% 
  mutate(MaxAtk = ifelse(Attack > SpAtk, Attack, SpAtk)) %>% 
  filter(MaxAtk >= 160 & Speed > 120) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Attack, SpAtk, MaxAtk,Speed)
## # A tibble: 5 x 9
##   PokedexNum Name                Type1   Type2   Total Attack SpAtk MaxAtk Speed
##        <int> <chr>               <chr>   <chr>   <int>  <int> <int>  <int> <int>
## 1         65 AlakazamMega Alaka… Psychic <NA>      590     50   175    175   150
## 2         94 GengarMega Gengar   Ghost   Poison    600     65   170    170   130
## 3        150 MewtwoMega Mewtwo X Psychic Fighti…   780    190   154    190   130
## 4        150 MewtwoMega Mewtwo Y Psychic <NA>      780    150   194    194   140
## 5        386 DeoxysAttack Forme  Psychic <NA>      600    180   180    180   150
pokemon %>% 
  mutate(MaxAtk = ifelse(Attack > SpAtk, Attack, SpAtk)) %>% 
  filter(MaxAtk >= 100 & Speed < 40) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Attack, SpAtk, MaxAtk,Speed)
## # A tibble: 20 x 9
##    PokedexNum Name                Type1   Type2  Total Attack SpAtk MaxAtk Speed
##         <int> <chr>               <chr>   <chr>  <int>  <int> <int>  <int> <int>
##  1         80 Slowbro             Water   Psych…   490     75   100    100    30
##  2         80 SlowbroMega Slowbro Water   Psych…   590     75   130    130    30
##  3        143 Snorlax             Normal  <NA>     540    110    65    110    30
##  4        185 Sudowoodo           Rock    <NA>     410    100    30    100    30
##  5        192 Sunflora            Grass   <NA>     425     75   105    105    30
##  6        199 Slowking            Water   Psych…   490     75   100    100    30
##  7        208 SteelixMega Steelix Steel   Ground   610    125    55    125    30
##  8        323 CameruptMega Camer… Fire    Ground   560    120   145    145    20
##  9        328 Trapinch            Ground  <NA>     290    100    45    100    10
## 10        460 AbomasnowMega Abom… Grass   Ice      594    132   132    132    30
## 11        518 Musharna            Psychic <NA>     487     55   107    107    29
## 12        525 Boldore             Rock    <NA>     390    105    50    105    20
## 13        526 Gigalith            Rock    <NA>     515    135    60    135    25
## 14        565 Carracosta          Water   Rock     495    108    83    108    32
## 15        577 Solosis             Psychic <NA>     290     30   105    105    20
## 16        578 Duosion             Psychic <NA>     370     40   125    125    30
## 17        579 Reuniclus           Psychic <NA>     490     65   125    125    30
## 18        589 Escavalier          Bug     Steel    495    135    60    135    20
## 19        680 Doublade            Steel   Ghost    448    110    45    110    35
## 20        713 Avalugg             Ice     <NA>     514    117    44    117    28

Here, I wanted to examine Offensive Potential by looking at Pokemon with high Speed and Atk/SpAtk stats. The issue in only looking at these two stats is that they usually come with poor Defense or HP. Pokemon with high Speed and Atk/SpAtk but low HP/Def/SpDef are usually known as “Glass Cannons”. Mega Alakazam is a great example of this.

pokemon %>% 
  mutate(MaxDef = ifelse(Defense > SpDef, Defense, SpDef)) %>% 
  ggplot(aes(x = HP, y = MaxDef)) +
  geom_point(aes(color = Type1)) +
  geom_smooth(method = 'lm') +
  scale_color_manual(values = type_colors)  + 
  labs(title = "Wall Potential (HP vs. MaxDef)")

pokemon %>% 
  mutate(MaxDef = ifelse(Defense > SpDef, Defense, SpDef)) %>% 
  filter(HP >= 150) %>% 
  arrange(-HP) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, HP, Defense, SpDef, MaxDef)
## # A tibble: 10 x 9
##    PokedexNum Name                Type1   Type2 Total    HP Defense SpDef MaxDef
##         <int> <chr>               <chr>   <chr> <int> <int>   <int> <int>  <int>
##  1        242 Blissey             Normal  <NA>    540   255      10   135    135
##  2        113 Chansey             Normal  <NA>    450   250       5   105    105
##  3        202 Wobbuffet           Psychic <NA>    405   190      58    58     58
##  4        321 Wailord             Water   <NA>    500   170      45    45     45
##  5        594 Alomomola           Water   <NA>    470   165      80    45     80
##  6        143 Snorlax             Normal  <NA>    540   160      65   110    110
##  7        289 Slaking             Normal  <NA>    670   150     100    65    100
##  8        426 Drifblim            Ghost   Flyi…   498   150      44    54     54
##  9        487 GiratinaAltered Fo… Ghost   Drag…   680   150     120   120    120
## 10        487 GiratinaOrigin For… Ghost   Drag…   680   150     100   100    100
pokemon %>% 
  mutate(MaxDef = ifelse(Defense > SpDef, Defense, SpDef)) %>% 
  filter(MaxDef >= 150) %>% 
  arrange(-MaxDef) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, HP, Defense, SpDef, MaxDef)
## # A tibble: 28 x 9
##    PokedexNum Name                Type1 Type2   Total    HP Defense SpDef MaxDef
##         <int> <chr>               <chr> <chr>   <int> <int>   <int> <int>  <int>
##  1        208 SteelixMega Steelix Steel Ground    610    75     230    95    230
##  2        213 Shuckle             Bug   Rock      505    20     230   230    230
##  3        306 AggronMega Aggron   Steel <NA>      630    70     230    80    230
##  4        208 Steelix             Steel Ground    510    75     200    65    200
##  5        377 Regirock            Rock  <NA>      580    80     200   100    200
##  6        378 Regice              Ice   <NA>      580    80     100   200    200
##  7        713 Avalugg             Ice   <NA>      514    95     184    46    184
##  8         80 SlowbroMega Slowbro Water Psychic   590    95     180    80    180
##  9         91 Cloyster            Water Ice       525    50     180    45    180
## 10        306 Aggron              Steel Rock      530    70     180    60    180
## # … with 18 more rows

Here I wanted to look for “Walls”, or high HP and Def/SpDef. These Pokemon are usually part of a “stall” strategy where they inflict status moves and use healing moves while your chiping away at your HP. Similar to the Offensive graph, it fails to map out the other stats, however these “Walls” usually have poor offensive stats.

# Scary Max
pokemon %>% 
  summarise(n = max(PokedexNum + 1),
            name = "OP",
            Total = max(HP)+max(Attack)+max(SpAtk)+max(Defense)+max(SpDef) + max(Speed),
            HP = max(HP),
            Attack = max(Attack),
            SpAtk = max(SpAtk),
            Defense = max(Defense),
            SpDef = max(SpDef),
            Speed = max(Speed))
## # A tibble: 1 x 9
##       n name  Total    HP Attack SpAtk Defense SpDef Speed
##   <dbl> <chr> <int> <int>  <int> <int>   <int> <int> <int>
## 1   722 OP     1279   255    190   194     230   230   180
pokemon %>% 
  filter(Total == max(Total))
## # A tibble: 3 x 18
##   PokedexNum Name  Type1 Type2 Total    HP Attack Defense SpAtk SpDef Speed
##        <int> <chr> <chr> <chr> <int> <int>  <int>   <int> <int> <int> <int>
## 1        150 Mewt… Psyc… Figh…   780   106    190     100   154   100   130
## 2        150 Mewt… Psyc… <NA>    780   106    150      70   194   120   140
## 3        384 Rayq… Drag… Flyi…   780   105    180     100   180   100   115
## # … with 7 more variables: Generation <fct>, Legendary <lgl>, AtkTotal <dbl>,
## #   DefTotal <dbl>, isMega <lgl>, isMultiType <lgl>, classification <fct>
pokemon %>% 
  summarise(n = min(PokedexNum - 1),
            name = "OP",
            Total = min(HP)+min(Attack)+min(SpAtk)+min(Defense)+min(SpDef) + min(Speed),
            HP = min(HP),
            Attack = min(Attack),
            SpAtk = min(SpAtk),
            Defense = min(Defense),
            SpDef = min(SpDef),
            Speed = min(Speed))
## # A tibble: 1 x 9
##       n name  Total    HP Attack SpAtk Defense SpDef Speed
##   <dbl> <chr> <int> <int>  <int> <int>   <int> <int> <int>
## 1     0 OP       46     1      5    10       5    20     5
pokemon %>% 
  filter(Total == min(Total))
## # A tibble: 1 x 18
##   PokedexNum Name  Type1 Type2 Total    HP Attack Defense SpAtk SpDef Speed
##        <int> <chr> <chr> <chr> <int> <int>  <int>   <int> <int> <int> <int>
## 1        191 Sunk… Grass <NA>    180    30     30      30    30    30    30
## # … with 7 more variables: Generation <fct>, Legendary <lgl>, AtkTotal <dbl>,
## #   DefTotal <dbl>, isMega <lgl>, isMultiType <lgl>, classification <fct>
pokemon %>% 
  summarise(n = as.integer(mean(PokedexNum - 1)),
            name = "OP",
            Total = as.integer(mean(HP))+as.integer(mean(Attack))+as.integer(mean(SpAtk))+as.integer(mean(Defense))+as.integer(mean(SpDef)) + as.integer(mean(Speed)),
            HP = as.integer(mean(HP)),
            Attack = as.integer(mean(Attack)),
            SpAtk = as.integer(mean(SpAtk)),
            Defense = as.integer(mean(Defense)),
            SpDef = as.integer(mean(SpDef)),
            Speed = as.integer(mean(Speed)))
## # A tibble: 1 x 9
##       n name  Total    HP Attack SpAtk Defense SpDef Speed
##   <int> <chr> <int> <int>  <int> <int>   <int> <int> <int>
## 1   361 OP      432    69     79    72      73    71    68
pokemon %>% 
  filter(Total == 432)
## # A tibble: 0 x 18
## # … with 18 variables: PokedexNum <int>, Name <chr>, Type1 <chr>, Type2 <chr>,
## #   Total <int>, HP <int>, Attack <int>, Defense <int>, SpAtk <int>,
## #   SpDef <int>, Speed <int>, Generation <fct>, Legendary <lgl>,
## #   AtkTotal <dbl>, DefTotal <dbl>, isMega <lgl>, isMultiType <lgl>,
## #   classification <fct>
pokemon %>% 
  summarise(n = as.integer(median(PokedexNum - 1)),
            name = "OP",
            Total = as.integer(median(HP))+as.integer(median(Attack))+as.integer(median(SpAtk))+as.integer(median(Defense))+as.integer(median(SpDef)) + as.integer(median(Speed)),
            HP = as.integer(median(HP)),
            Attack = as.integer(median(Attack)),
            SpAtk = as.integer(median(SpAtk)),
            Defense = as.integer(median(Defense)),
            SpDef = as.integer(median(SpDef)),
            Speed = as.integer(median(Speed)))
## # A tibble: 1 x 9
##       n name  Total    HP Attack SpAtk Defense SpDef Speed
##   <int> <chr> <int> <int>  <int> <int>   <int> <int> <int>
## 1   363 OP      410    65     75    65      70    70    65
pokemon %>% 
  filter(Total == 410)
## # A tibble: 9 x 18
##   PokedexNum Name  Type1 Type2 Total    HP Attack Defense SpAtk SpDef Speed
##        <int> <chr> <chr> <chr> <int> <int>  <int>   <int> <int> <int> <int>
## 1         77 Pony… Fire  <NA>    410    50     85      55    65    65    90
## 2        185 Sudo… Rock  <NA>    410    70    100     115    30    65    30
## 3        219 Magc… Fire  Rock    410    50     50     120    80    80    30
## 4        247 Pupi… Rock  Grou…   410    70     84      70    65    70    51
## 5        308 Medi… Figh… Psyc…   410    60     60      75    60    75    80
## 6        364 Seal… Ice   Water   410    90     60      70    75    70    45
## 7        400 Biba… Norm… Water   410    79     85      60    55    60    71
## 8        444 Gabi… Drag… Grou…   410    68     90      65    50    55    82
## 9        611 Frax… Drag… <NA>    410    66    117      70    40    50    67
## # … with 7 more variables: Generation <fct>, Legendary <lgl>, AtkTotal <dbl>,
## #   DefTotal <dbl>, isMega <lgl>, isMultiType <lgl>, classification <fct>

This is just a fun look at taking the max/min/mean/median of all stats and how it measures up against actual Pokemon.

# Chansey Color Scheme
pokemon %>% 
ggplot(aes(x=HP)) +
  geom_histogram(binwidth=4, fill="#ffacac", colour="#ff835a") + 
  labs(x="HP", y="Frequency") 

# Landorus Color Scheme
pokemon %>% 
ggplot(aes(x=Attack)) +
  geom_histogram(binwidth=4, fill="#f67b41", colour="#83624a") + 
  labs(x="Attack", y="Frequency") 

# Greninja
pokemon %>% 
ggplot(aes(x=SpAtk)) +
  geom_histogram(binwidth=4, fill="#354698", colour="#e7788d") + 
  labs(x="SpAtk", y="Frequency") 

# Steelix
pokemon %>% 
ggplot(aes(x=Defense)) +
  geom_histogram(binwidth=4, fill="#7b94a4", colour="#dee6de") + 
  labs(x="Defense", y="Frequency") 

# Shuckle
pokemon %>% 
ggplot(aes(x=SpDef)) +
  geom_histogram(binwidth=4, fill="#b43129", colour="#ffff5a") + 
  labs(x="SpDef", y="Frequency") 

#Deoxys
pokemon %>% 
ggplot(aes(x=Speed)) +
  geom_histogram(binwidth=4, fill="#5294ac", colour="#ff734a") + 
  labs(x="Speed", y="Frequency") 

#Mewtwo
pokemon %>% 
ggplot(aes(x=Total)) +
  geom_histogram(binwidth=10, fill="#6a319c", colour="#b4acc5") + 
  labs(x="Total", y="Frequency") 

Histograms displaying the spread of each stat.

pokemon %>% 
  ggplot(aes(x=HP, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="HP", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Attack, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Attack", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=SpAtk, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="SpAtk", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Defense, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Defense", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=SpDef, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="SpDef", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Speed, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Speed", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Total, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Total", y="Density", title = "Legendary Comparison")

Density graphs showing the disparity between normal Pokemon and Legendary Pokemon for all stats.

pokemon %>% 
  group_by(Generation) %>% 
  summarise(avg = as.integer(mean(Total))) %>% 
  ggplot(aes(x=Generation, y = avg, group = 1)) +
  geom_line() +
  geom_point(color = "red") +
  labs(title = "Average Total for each Generation")

Mapping the average total in each Generation. (similar to the boxcharts above)

pokemon %>%
  group_by(Generation) %>%
  summarize(HP=mean(HP),
            Attack=mean(Attack),
            Defense=mean(Defense),
            Sp..Atk=mean(SpAtk),
            Sp..Def=mean(SpDef),
            Speed=mean(Speed)) %>%
  gather(Stats, value, 2:7) %>%
  ggplot(aes(x=Generation, y=value, group=1)) +
    geom_line() +
    geom_point(color = "red") +
    facet_wrap(~Stats) +
    labs(y="Average Stats")

The average for each stat across Generations. We see an downtick in Generation 2 for Attack, SpAtk, and Speed. Defense, SpDef, and HP do not show much variance.

#https://drmowinckels.io/blog/adding-external-images-to-plots/
#download.file("https://cdn.bulbagarden.net/upload/5/56/242Blissey.png", "blissey.png")
# download.file("http://cdn.bulbagarden.net/upload/f/f8/242MS.png", "blissey-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/e/ea/113MS.png", "chansey-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/f/fa/202MS.png", "wobb-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/e/ec/321MS.png", "wailord-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/5/5a/594MS.png", "alo-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/e/e0/143XYMS.png", "snorlax-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/0/0d/289MS.png", "slaking-h-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/e8/487MS.png", "g-alt-h-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/2f/487OMS.png", "g-o-h-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/4/45/426MS.png", "drif-h-sprite.png")
hp_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",
                                       pattern = "-h-"), 
                     pokemon %>% 
  top_n(10, HP) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(HP)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
hp_sprites
## # A tibble: 10 x 3
##    images               Name                   rank
##    <chr>                <chr>                 <int>
##  1 drif-h-sprite.png    Drifblim                  1
##  2 g-alt-h-sprite.png   GiratinaAltered Forme     2
##  3 g-o-h-sprite.png     GiratinaOrigin Forme      3
##  4 slaking-h-sprite.png Slaking                   4
##  5 snorlax-h-sprite.png Snorlax                   5
##  6 alo-h-sprite.png     Alomomola                 6
##  7 wailord-h-sprite.png Wailord                   7
##  8 wobb-h-sprite.png    Wobbuffet                 8
##  9 chansey-h-sprite.png Chansey                   9
## 10 blissey-h-sprite.png Blissey                  10
img = readPNG("images/blissey.png")
g =  rasterGrob(img, interpolate=TRUE)
g_sprite = list()
hp_plot <- pokemon %>%
  select(Name, HP) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, HP), y=HP)) +
  geom_bar(aes(fill=HP), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=HP)) +
  scale_fill_gradient(low="#ff835a", high="#ffacac") + 
  coord_flip() +
  labs(x="Name", title="Top 10 HP Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=7, ymin=160, ymax=260)
for(i in 1:nrow(hp_sprites)){
  img = readPNG(paste0("images/",hp_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  hp_plot = hp_plot +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-15, ymax=2.5)
}

hp_plot

# download.file("https://archives.bulbagarden.net/media/upload/6/67/150MXMS.png", "mewtwo-a-x.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/72/214MMS.png", "hera-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ad/384MMS.png", "ray-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/98/383PMS.png", "groudon-a-p.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/07/386AMS.png", "deoxys-a-aform.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c0/646BMS.png", "kyurem-a-b.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/7f/445MMS.png", "garch-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/91/409MS.png", "ramp-a-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0a/475MMS.png", "gallade-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/5e/354MMS.png", "banette-a-m.png")
atk_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",
                                        pattern = "-a-"), pokemon %>% 
  top_n(10, Attack) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Attack)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
atk_sprites
## # A tibble: 10 x 3
##    images             Name                     rank
##    <chr>              <chr>                   <int>
##  1 banette-a-m.png    BanetteMega Banette         1
##  2 gallade-a-m.png    GalladeMega Gallade         2
##  3 ramp-a-sprite.png  Rampardos                   3
##  4 garch-a-m.png      GarchompMega Garchomp       4
##  5 kyurem-a-b.png     KyuremBlack Kyurem          5
##  6 deoxys-a-aform.png DeoxysAttack Forme          6
##  7 groudon-a-p.png    GroudonPrimal Groudon       7
##  8 ray-a-m.png        RayquazaMega Rayquaza       8
##  9 hera-a-m.png       HeracrossMega Heracross     9
## 10 mewtwo-a-x.png     MewtwoMega Mewtwo X        10
#download.file("https://cdn.bulbagarden.net/upload/7/7f/150Mewtwo-Mega_X.png", "mega-X.png")
img = readPNG("images/mega-X.png")
g =  rasterGrob(img, interpolate=TRUE)
g_sprite = list()
atk_graph <- pokemon %>%
  select(Name, Attack) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Attack), y=Attack)) +
  geom_bar(aes(fill=Attack), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Attack)) +
  scale_fill_gradient(low="#b4acc5", high="#6a319c") + 
  coord_flip() +
  labs(x="Name", title="Top 10 Attack Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=155, ymax=210)

for(i in 1:nrow(atk_sprites)){
  img = readPNG(paste0("images/",atk_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  atk_graph = atk_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

atk_graph

# download.file("https://archives.bulbagarden.net/media/upload/2/29/181MMS.png", "ampharos-spa-,.png")
# download.file("https://archives.bulbagarden.net/media/upload/3/34/282MMS.png", "gardevoir-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/f4/094MMS.png", "gengar-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/64/720UMS.png", "hoopa-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/74/646WMS.png", "kyurem-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0c/065MMS.png", "alakazam-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/07/386AMS.png", "deoxys-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/02/382PMS.png", "kyogre-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ad/384MMS.png", "ray-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/ff/150MYMS.png", "mewtwo-spa-.png")
spatk_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",pattern = "-spa-"), pokemon %>% 
  top_n(10, SpAtk) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(SpAtk)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
spatk_sprites
## # A tibble: 10 x 3
##    images             Name                     rank
##    <chr>              <chr>                   <int>
##  1 ampharos-spa-,.png AmpharosMega Ampharos       1
##  2 gardevoir-spa-.png GardevoirMega Gardevoir     2
##  3 gengar-spa-.png    GengarMega Gengar           3
##  4 hoopa-spa-.png     HoopaHoopa Unbound          4
##  5 kyurem-spa-.png    KyuremWhite Kyurem          5
##  6 alakazam-spa-.png  AlakazamMega Alakazam       6
##  7 deoxys-spa-.png    DeoxysAttack Forme          7
##  8 kyogre-spa-.png    KyogrePrimal Kyogre         8
##  9 ray-spa-.png       RayquazaMega Rayquaza       9
## 10 mewtwo-spa-.png    MewtwoMega Mewtwo Y        10
#download.file("https://cdn.bulbagarden.net/upload/5/5f/150Mewtwo-Mega_Y.png", "mega-Y.png")
img = readPNG("images/mega-Y.png")
g =  rasterGrob(img, interpolate=TRUE)
spatk_graph <- pokemon %>%
  select(Name, SpAtk) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, SpAtk), y=SpAtk)) +
  geom_bar(aes(fill=SpAtk), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=SpAtk)) +
  scale_fill_gradient(low="#b4acc5", high="#6a319c") + 
  coord_flip() +
  labs(x="Name", title="Top 10 SpAtk Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=4, ymin=160, ymax=210)

for(i in 1:nrow(spatk_sprites)){
  img = readPNG(paste0("images/",spatk_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  spatk_graph = spatk_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

spatk_graph

# download.file("https://archives.bulbagarden.net/media/upload/5/5b/306MMS.png", "aggron-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/d/d5/306MS.png", "aggron-d-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/53/713MS.png", "avalugg-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/ef/411MS.png", "bastiodon-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ac/091XYMS.png", "cloyster-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/6c/377MS.png", "regirock-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0a/213MS.png", "shuckle-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/65/080MMS.png", "slobrow-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/b/bf/208MS.png", "steelix-d-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/05/208MMS.png", "steelix-d.png")
def_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",
                                        pattern = "-d"), pokemon %>% 
  top_n(10, Defense) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Defense)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
def_sprites
## # A tibble: 10 x 3
##    images          Name                 rank
##    <chr>           <chr>               <int>
##  1 bastiodon-d.png Bastiodon               1
##  2 aggron-d-n.png  Aggron                  2
##  3 cloyster-d.png  Cloyster                3
##  4 slobrow-d.png   SlowbroMega Slowbro     4
##  5 avalugg-d.png   Avalugg                 5
##  6 regirock-d.png  Regirock                6
##  7 steelix-d-n.png Steelix                 7
##  8 aggron-d.png    AggronMega Aggron       8
##  9 shuckle-d.png   Shuckle                 9
## 10 steelix-d.png   SteelixMega Steelix    10
#download.file("https://cdn.bulbagarden.net/upload/1/1b/208Steelix-Mega.png", "mega-steel.png")
img = readPNG("images/mega-steel.png")
g =  rasterGrob(img, interpolate=TRUE)
def_graph <- pokemon %>%
  select(Name, Defense) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Defense), y=Defense)) +
  geom_bar(aes(fill=Defense), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Defense)) +
  scale_fill_gradient(low="#dee6de", high="#7b94a4") + 
  coord_flip() +
  labs(x="Name", title="Top 10 Defense Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=180, ymax=245)

for(i in 1:nrow(def_sprites)){
  img = readPNG(paste0("images/",def_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  def_graph = def_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

def_graph

# download.file("https://archives.bulbagarden.net/media/upload/3/37/681MS.png", "aegislash-spd,.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/5b/703MS.png", "carbink-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/20/386DMS.png", "deoxys-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/53/719MS.png", "diancie-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/ea/671MS.png", "florges-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/3/3c/706MS.png", "goodra-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/ee/250MS.png", "Hooh-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/02/382PMS.png", "kyogre-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/24/380MMS.png", "latias-spd-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c8/249MS.png", "lugia-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/aa/476MS.png", "probopass-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/99/378MS.png", "regice-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/2e/379MS.png", "registeel-spd-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0a/213MS.png", "shuckle-spd.png")
spdef_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/", pattern = "-spd"), pokemon %>% 
  top_n(10, SpDef) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(SpDef)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
spdef_sprites
## # A tibble: 14 x 3
##    images              Name                   rank
##    <chr>               <chr>                 <int>
##  1 aegislash-spd,.png  AegislashShield Forme     1
##  2 carbink-spd.png     Carbink                   2
##  3 diancie-spd.png     Diancie                   3
##  4 goodra-spd.png      Goodra                    4
##  5 latias-spd-n.png    LatiasMega Latias         5
##  6 probopass-spd.png   Probopass                 6
##  7 registeel-spd-n.png Registeel                 7
##  8 florges-spd.png     Florges                   8
##  9 Hooh-spd.png        Ho-oh                     9
## 10 lugia-spd.png       Lugia                    10
## 11 deoxys-spd.png      DeoxysDefense Forme      11
## 12 kyogre-spd.png      KyogrePrimal Kyogre      12
## 13 regice-spd.png      Regice                   13
## 14 shuckle-spd.png     Shuckle                  14
#download.file("https://cdn.bulbagarden.net/upload/c/c7/213Shuckle.png", "shuckle.png")
img = readPNG("images/shuckle.png")
g =  rasterGrob(img, interpolate=TRUE)
spdef_graph <- pokemon %>%
  select(Name, SpDef) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, SpDef), y=SpDef)) +
  geom_bar(aes(fill=SpDef), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=SpDef)) +
  scale_fill_gradient(low="#ffff5a", high="#b43129") + 
  coord_flip() +
  labs(x="Name", title="Top 10 SpDef Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=180, ymax=230)

for(i in 1:nrow(spdef_sprites)){
  img = readPNG(paste0("images/", spdef_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  spdef_graph = spdef_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

spdef_graph

# download.file("https://archives.bulbagarden.net/media/upload/0/07/386AMS.png", "deoxys-spe-a.png")
# download.file("https://archives.bulbagarden.net/media/upload/8/86/386MS.png", "deoxys-spe-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/fa/386SMS.png", "deoxys-spe-s.png")
# download.file("https://archives.bulbagarden.net/media/upload/3/33/617MS.png", "accelgor-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c4/142MMS.png", "aerodac-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0c/065MMS.png", "alakazam-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c8/015MMS.png", "buzz-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/69/101MS.png", "electrocude-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/ff/150MYMS.png", "mewtwo-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/54/291MS.png", "ninjask-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/27/254MMS.png", "sceptile-spe.png")
speed_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/", pattern = "-spe"), pokemon %>% 
  top_n(10, Speed) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Speed)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
speed_sprites
## # A tibble: 11 x 3
##    images              Name                       rank
##    <chr>               <chr>                     <int>
##  1 electrocude-spe.png Electrode                     1
##  2 mewtwo-spe.png      MewtwoMega Mewtwo Y           2
##  3 accelgor-spe.png    Accelgor                      3
##  4 buzz-spe.png        BeedrillMega Beedrill         4
##  5 sceptile-spe.png    SceptileMega Sceptile         5
##  6 aerodac-spe.png     AerodactylMega Aerodactyl     6
##  7 alakazam-spe.png    AlakazamMega Alakazam         7
##  8 deoxys-spe-a.png    DeoxysAttack Forme            8
##  9 deoxys-spe-n.png    DeoxysNormal Forme            9
## 10 ninjask-spe.png     Ninjask                      10
## 11 deoxys-spe-s.png    DeoxysSpeed Forme            11
#download.file("https://cdn.bulbagarden.net/upload/2/2b/386Deoxys-Speed.png", "speed.png")
img = readPNG("images/speed.png")
g =  rasterGrob(img, interpolate=TRUE)
speed_graph <- pokemon %>%
  select(Name, Speed) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Speed), y=Speed)) +
  geom_bar(aes(fill=Speed), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Speed)) +
  scale_fill_gradient(low="#5294ac", high="#ff734a") + 
  coord_flip() +
  labs(x="Name", title="Top 10 Speed Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=150, ymax=200)

for(i in 1:nrow(speed_sprites)){
  img = readPNG(paste0("images/",speed_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  speed_graph = speed_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

speed_graph

# download.file("https://archives.bulbagarden.net/media/upload/0/0e/493OD_DP.png", "arceus-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ae/719MMS.png", "diancie-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/7f/445MMS.png", "garchomp-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/98/383PMS.png", "groudon-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/02/382PMS.png", "kyogre-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c0/646BMS.png", "kyurem-tol-b.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/74/646WMS.png", "kyurem-tol-w.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/24/380MMS.png", "latias-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/2d/381MMS.png", "latios-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/54/376MMS.png", "metagross-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/67/150MXMS.png", "mewtwo-tol-x.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/ff/150MYMS.png", "mewtwo-tol-y.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ad/384MMS.png", "ray-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/4/4a/373MMS.png", "salamence-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/59/248MMS.png", "ttar-tol.png")
total_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/", pattern = "-tol"), pokemon %>% 
  top_n(10, Total) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Total)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
total_sprites
## # A tibble: 15 x 3
##    images            Name                     rank
##    <chr>             <chr>                   <int>
##  1 diancie-tol.png   DiancieMega Diancie         1
##  2 garchomp-tol.png  GarchompMega Garchomp       2
##  3 kyurem-tol-b.png  KyuremBlack Kyurem          3
##  4 kyurem-tol-w.png  KyuremWhite Kyurem          4
##  5 latias-tol.png    LatiasMega Latias           5
##  6 latios-tol.png    LatiosMega Latios           6
##  7 metagross-tol.png MetagrossMega Metagross     7
##  8 salamence-tol.png SalamenceMega Salamence     8
##  9 ttar-tol.png      TyranitarMega Tyranitar     9
## 10 arceus-tol.png    Arceus                     10
## 11 groudon-tol.png   GroudonPrimal Groudon      11
## 12 kyogre-tol.png    KyogrePrimal Kyogre        12
## 13 mewtwo-tol-x.png  MewtwoMega Mewtwo X        13
## 14 mewtwo-tol-y.png  MewtwoMega Mewtwo Y        14
## 15 ray-tol.png       RayquazaMega Rayquaza      15
#download.file("https://cdn.bulbagarden.net/upload/5/58/384Rayquaza-Mega.png", "mega-raq.png")
img = readPNG("images/mega-raq.png")
g =  rasterGrob(img, interpolate=TRUE)
total_graph <- pokemon %>%
  select(Name, Total) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Total), y=Total)) +
  geom_bar(aes(fill=Total), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Total)) +
  coord_flip() +
  scale_fill_gradient(low="#f6de00", high="#5abd8b") + 
  labs(x="Name", title="Top 10 Total Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=600, ymax=900)

for(i in 1:nrow(total_sprites)){
  img = readPNG(paste0("images/",total_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  total_graph = total_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-40, ymax=10)
}

total_graph

pokemon %>% 
  ggplot(aes(x = PokedexNum, y = Total, color = classification)) +
  geom_point()

Again, we see the overlap between Legendary and Mega (and to a lesser extent Normal) as well as the difference between most normal Pokemon and the two classes.

EDA (Generation)

pokemon %>%
  count(Generation) %>%
  ggplot(aes(x=Generation, y=n, fill = Generation, color = Generation)) + 
  geom_bar(stat="identity") +
  geom_label(aes(label=n)) +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon per generation") +
  scale_fill_manual(values = c("#2062ac", "#deac00", "#ff2029", "#cdb4d5", "#181820", "#6275b9"),
                    guide = "none") +
  scale_color_manual(values = c("white", "white", "white", "white", "white", "white"),
                    guide = "none")

Generations 1,3, and 5 all introduce similar amounts of Pokemon (~160), while the other three Generations show more variance with the lastest generation having the least amount of new Pokemon introduced.

ggplot(pokemon, aes(x=Type1, fill=Generation)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") +
  scale_fill_manual(values = c("#2062ac", "#deac00", "#ff2029", "#cdb4d5", "#181820", "#6275b9"),
                    guide = "none")

We see a similar distribution of types between Generations.

ggplot(pokemon, aes(x=Generation, fill=Type1)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") +
  scale_fill_manual(values = type_colors)

The same information is conveyed in this graph with the variables flipped.

ggplot(pokemon, aes(x=Generation, fill=isMultiType)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") 

About half of Pokemon in any generation have a secondary typing.

ggplot(pokemon, aes(x=Generation, fill=isMega)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") 

Megas seem to be reserved for Pokemon from older Generations. This is inline with the introduction, as only Generation 1 was allowed to have Mega Evolutions at first.

Modelling

library(class) #for KNN
library(caret) #for cross validation of KNN method
test <- pokemon %>% group_by(classification) %>% sample_frac(.2)
train_data <- setdiff(pokemon, test)

dim(test)
## [1] 160  18
dim(train_data)
## [1] 640  18
glimpse(test)
## Rows: 160
## Columns: 18
## Groups: classification [3]
## $ PokedexNum     <int> 483, 380, 486, 721, 642, 487, 494, 646, 718, 386, 640,…
## $ Name           <chr> "Dialga", "Latias", "Regigigas", "Volcanion", "Thundur…
## $ Type1          <chr> "Steel", "Dragon", "Normal", "Fire", "Electric", "Ghos…
## $ Type2          <chr> "Dragon", "Psychic", NA, "Water", "Flying", "Dragon", …
## $ Total          <int> 680, 600, 670, 600, 580, 680, 600, 660, 600, 600, 580,…
## $ HP             <int> 100, 80, 110, 80, 79, 150, 100, 125, 108, 50, 91, 75, …
## $ Attack         <int> 120, 80, 160, 110, 115, 120, 100, 130, 100, 70, 90, 12…
## $ Defense        <int> 120, 90, 110, 120, 70, 100, 100, 90, 121, 160, 72, 70,…
## $ SpAtk          <int> 150, 110, 80, 130, 125, 120, 100, 130, 81, 70, 90, 125…
## $ SpDef          <int> 100, 130, 110, 90, 80, 100, 100, 90, 95, 160, 129, 70,…
## $ Speed          <int> 90, 110, 100, 70, 111, 90, 100, 95, 95, 90, 108, 115, …
## $ Generation     <fct> 4, 3, 4, 6, 5, 4, 5, 5, 6, 3, 5, 4, 3, 1, 4, 4, 2, 1, …
## $ Legendary      <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
## $ AtkTotal       <dbl> 270, 190, 240, 240, 240, 240, 200, 260, 181, 140, 180,…
## $ DefTotal       <dbl> 220, 220, 220, 210, 150, 200, 200, 180, 216, 320, 201,…
## $ isMega         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
## $ isMultiType    <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ classification <fct> Legendary, Legendary, Legendary, Legendary, Legendary,…
glimpse(train_data)
## Rows: 640
## Columns: 18
## $ PokedexNum     <int> 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, 1…
## $ Name           <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Charmander", "Cha…
## $ Type1          <chr> "Grass", "Grass", "Grass", "Fire", "Fire", "Fire", "Fi…
## $ Type2          <chr> "Poison", "Poison", "Poison", NA, NA, "Flying", "Flyin…
## $ Total          <int> 318, 405, 525, 309, 405, 534, 634, 314, 405, 530, 630,…
## $ HP             <int> 45, 60, 80, 39, 58, 78, 78, 44, 59, 79, 79, 45, 50, 60…
## $ Attack         <int> 49, 62, 82, 52, 64, 84, 104, 48, 63, 83, 103, 30, 20, …
## $ Defense        <int> 49, 63, 83, 43, 58, 78, 78, 65, 80, 100, 120, 35, 55, …
## $ SpAtk          <int> 65, 80, 100, 60, 80, 109, 159, 50, 65, 85, 135, 20, 25…
## $ SpDef          <int> 65, 80, 100, 50, 65, 85, 115, 64, 80, 105, 115, 20, 25…
## $ Speed          <int> 45, 60, 80, 65, 80, 100, 100, 43, 58, 78, 78, 45, 30, …
## $ Generation     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Legendary      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
## $ AtkTotal       <dbl> 114, 142, 182, 112, 144, 193, 263, 98, 128, 168, 238, …
## $ DefTotal       <dbl> 114, 143, 183, 93, 123, 163, 193, 129, 160, 205, 235, …
## $ isMega         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,…
## $ isMultiType    <lgl> TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FAL…
## $ classification <fct> Normal, Normal, Normal, Normal, Normal, Normal, Mega, …
class.knn.20 = knn(
  train = train_data[6:11], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = train_data$classification, # vector of class labels for train_dataing data
  k = 20)
class.knn.20
##   [1] Legendary Normal    Normal    Normal    Normal    Legendary Legendary
##   [8] Legendary Normal    Normal    Normal    Normal    Legendary Normal   
##  [15] Normal    Normal    Normal    Legendary Legendary Normal    Normal   
##  [22] Legendary Normal    Normal    Normal    Normal    Normal    Normal   
##  [29] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [36] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [43] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [50] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [57] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [64] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [71] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [78] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [85] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [92] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [99] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [106] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [113] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [120] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [127] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [134] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [141] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [148] Normal    Normal    Normal    Normal    Normal    Legendary Normal   
## [155] Normal    Normal    Normal    Normal    Normal    Normal   
## Levels: Legendary Mega Normal
tibble(test[c("Name", "classification")], class.knn.20)
## # A tibble: 160 x 3
##    Name                     classification class.knn.20
##    <chr>                    <fct>          <fct>       
##  1 Dialga                   Legendary      Legendary   
##  2 Latias                   Legendary      Normal      
##  3 Regigigas                Legendary      Normal      
##  4 Volcanion                Legendary      Normal      
##  5 ThundurusIncarnate Forme Legendary      Normal      
##  6 GiratinaOrigin Forme     Legendary      Legendary   
##  7 Victini                  Legendary      Legendary   
##  8 Kyurem                   Legendary      Legendary   
##  9 Zygarde50% Forme         Legendary      Normal      
## 10 DeoxysDefense Forme      Legendary      Normal      
## # … with 150 more rows
class.knn.conf.20 = table(true = test$classification, predicted = class.knn.20)
class.knn.conf.20
##            predicted
## true        Legendary Mega Normal
##   Legendary         4    0      8
##   Mega              4    0      6
##   Normal            1    0    137
class.knn.50 = knn(
  train = train_data[6:11], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = train_data$classification, # vector of class labels for train_dataing data
  k = 50)
class.knn.50
##   [1] Legendary Normal    Normal    Normal    Normal    Legendary Normal   
##   [8] Legendary Normal    Normal    Normal    Normal    Legendary Normal   
##  [15] Normal    Normal    Normal    Legendary Normal    Normal    Normal   
##  [22] Legendary Normal    Normal    Normal    Normal    Normal    Normal   
##  [29] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [36] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [43] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [50] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [57] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [64] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [71] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [78] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [85] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [92] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [99] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [106] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [113] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [120] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [127] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [134] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [141] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [148] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [155] Normal    Normal    Normal    Normal    Normal    Normal   
## Levels: Legendary Mega Normal
tibble(test[c("Name", "classification")], class.knn.50)
## # A tibble: 160 x 3
##    Name                     classification class.knn.50
##    <chr>                    <fct>          <fct>       
##  1 Dialga                   Legendary      Legendary   
##  2 Latias                   Legendary      Normal      
##  3 Regigigas                Legendary      Normal      
##  4 Volcanion                Legendary      Normal      
##  5 ThundurusIncarnate Forme Legendary      Normal      
##  6 GiratinaOrigin Forme     Legendary      Legendary   
##  7 Victini                  Legendary      Normal      
##  8 Kyurem                   Legendary      Legendary   
##  9 Zygarde50% Forme         Legendary      Normal      
## 10 DeoxysDefense Forme      Legendary      Normal      
## # … with 150 more rows
class.knn.conf.50 = table(true = test$classification, predicted = class.knn.50)
class.knn.conf.50
##            predicted
## true        Legendary Mega Normal
##   Legendary         3    0      9
##   Mega              3    0      7
##   Normal            0    0    138
trControl <- trainControl(method  = "cv",
                          number  = 20)

fit <- train(classification ~  HP + Attack + Defense + SpAtk + SpDef + Speed,
             method     = "knn",
             tuneGrid   = expand.grid(k = 1:50),
             trControl  = trControl,
             metric     = "Accuracy",
             data       = train_data
)
fit
## k-Nearest Neighbors 
## 
## 640 samples
##   6 predictor
##   3 classes: 'Legendary', 'Mega', 'Normal' 
## 
## No pre-processing
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 607, 608, 609, 608, 608, 607, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    1  0.8986214  0.5449567
##    2  0.9002343  0.5652602
##    3  0.8970146  0.5203502
##    4  0.8939874  0.4883874
##    5  0.9048332  0.5444305
##    6  0.9000388  0.5020137
##    7  0.9078070  0.5359710
##    8  0.9030691  0.4852282
##    9  0.9062949  0.5168652
##   10  0.9016990  0.4879702
##   11  0.9061941  0.4873755
##   12  0.9046316  0.4882326
##   13  0.9031164  0.4779515
##   14  0.8984763  0.4450063
##   15  0.9000861  0.4431106
##   16  0.8999914  0.4413798
##   17  0.8921759  0.3921029
##   18  0.8921728  0.3840586
##   19  0.8905630  0.3777948
##   20  0.8921255  0.3767443
##   21  0.8905630  0.3474226
##   22  0.8922297  0.3750946
##   23  0.8876369  0.3359588
##   24  0.8891994  0.3402897
##   25  0.8891047  0.3350749
##   26  0.8906672  0.3301477
##   27  0.8923274  0.3554081
##   28  0.8954051  0.3607145
##   29  0.8891520  0.3346144
##   30  0.8908123  0.3468491
##   31  0.8907619  0.3361628
##   32  0.8891994  0.3332768
##   33  0.8891520  0.3235309
##   34  0.8891520  0.3235309
##   35  0.8907145  0.3349036
##   36  0.8906672  0.3328042
##   37  0.8891520  0.3319202
##   38  0.8891047  0.3340157
##   39  0.8891047  0.3301286
##   40  0.8859228  0.3053315
##   41  0.8859228  0.2920119
##   42  0.8859228  0.2920119
##   43  0.8843603  0.2919405
##   44  0.8874853  0.3032871
##   45  0.8859702  0.2948302
##   46  0.8890478  0.3059627
##   47  0.8890478  0.3014605
##   48  0.8890478  0.3014605
##   49  0.8890478  0.2975734
##   50  0.8890478  0.2930713
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.
k = fit$results$k[which.max(fit$results$Accuracy)]
class.knn = knn(
  train = train_data[6:11], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = train_data$classification, # vector of class labels for train_dataing data
  k = k)
class.knn
##   [1] Legendary Normal    Mega      Legendary Legendary Legendary Normal   
##   [8] Legendary Legendary Normal    Normal    Normal    Mega      Normal   
##  [15] Mega      Normal    Normal    Mega      Legendary Legendary Normal   
##  [22] Legendary Normal    Normal    Normal    Normal    Normal    Normal   
##  [29] Normal    Legendary Normal    Normal    Normal    Normal    Normal   
##  [36] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [43] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [50] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [57] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [64] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [71] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [78] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [85] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [92] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
##  [99] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [106] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [113] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [120] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [127] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [134] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [141] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [148] Normal    Normal    Normal    Normal    Normal    Normal    Normal   
## [155] Normal    Normal    Normal    Normal    Normal    Normal   
## Levels: Legendary Mega Normal
error_knn <- tibble(test[c("Name", "classification")], class.knn) %>% 
  filter(classification != class.knn)
error_knn
## # A tibble: 14 x 3
##    Name                  classification class.knn
##    <chr>                 <fct>          <fct>    
##  1 Latias                Legendary      Normal   
##  2 Regigigas             Legendary      Mega     
##  3 Victini               Legendary      Normal   
##  4 DeoxysDefense Forme   Legendary      Normal   
##  5 Virizion              Legendary      Normal   
##  6 Azelf                 Legendary      Normal   
##  7 BeedrillMega Beedrill Mega           Normal   
##  8 LopunnyMega Lopunny   Mega           Normal   
##  9 SteelixMega Steelix   Mega           Normal   
## 10 SceptileMega Sceptile Mega           Legendary
## 11 VenusaurMega Venusaur Mega           Legendary
## 12 SwampertMega Swampert Mega           Normal   
## 13 LatiasMega Latias     Mega           Legendary
## 14 Cresselia             Normal         Legendary
class.knn.conf = table(true = test$classification, predicted = class.knn)
class.knn.conf
##            predicted
## true        Legendary Mega Normal
##   Legendary         6    1      5
##   Mega              3    3      4
##   Normal            1    0    137
nrow(error_knn)
## [1] 14
error_rate = (nrow(error_knn))/160
error_rate
## [1] 0.0875
detach("package:class", unload = TRUE)

Modelling (Random Oversampling)

testing_df <- train_data %>% 
  select(classification, HP, Attack, Defense, SpAtk, SpDef, Speed)

utrain <- upSample(testing_df[,-1], testing_df$classification)

table(utrain$Class)
## 
## Legendary      Mega    Normal 
##       554       554       554
fit <- train(Class ~  .,
             method     = "knn",
             tuneGrid   = expand.grid(k = 1:50),
             trControl  = trControl,
             metric     = "Accuracy",
             data       = utrain
)
fit
## k-Nearest Neighbors 
## 
## 1662 samples
##    6 predictor
##    3 classes: 'Legendary', 'Mega', 'Normal' 
## 
## No pre-processing
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 1580, 1578, 1579, 1578, 1579, 1578, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    1  0.9897870  0.9846803
##    2  0.9789428  0.9684118
##    3  0.9729110  0.9593646
##    4  0.9693033  0.9539542
##    5  0.9638813  0.9458223
##    6  0.9560568  0.9340810
##    7  0.9446679  0.9169943
##    8  0.9320306  0.8980401
##    9  0.9109588  0.8664401
##   10  0.8965069  0.8447613
##   11  0.8731050  0.8096610
##   12  0.8526432  0.7789634
##   13  0.8520262  0.7780401
##   14  0.8478452  0.7717811
##   15  0.8387867  0.7581726
##   16  0.8315714  0.7473524
##   17  0.8273755  0.7410587
##   18  0.8201534  0.7302256
##   19  0.8195582  0.7293107
##   20  0.8219608  0.7329182
##   21  0.8207123  0.7310522
##   22  0.8285735  0.7428740
##   23  0.8261781  0.7392506
##   24  0.8279561  0.7419504
##   25  0.8274483  0.7411740
##   26  0.8285957  0.7428943
##   27  0.8309260  0.7463805
##   28  0.8321672  0.7482451
##   29  0.8327549  0.7491222
##   30  0.8357890  0.7536641
##   31  0.8376396  0.7564451
##   32  0.8376111  0.7564207
##   33  0.8278775  0.7418213
##   34  0.8309337  0.7464250
##   35  0.8206990  0.7310590
##   36  0.8249087  0.7373734
##   37  0.8339385  0.7509232
##   38  0.8363700  0.7545758
##   39  0.8333431  0.7500273
##   40  0.8333652  0.7500503
##   41  0.8340038  0.7510316
##   42  0.8333793  0.7500933
##   43  0.8369870  0.7555082
##   44  0.8406235  0.7609784
##   45  0.8430333  0.7645757
##   46  0.8478167  0.7717541
##   47  0.8412331  0.7618986
##   48  0.8382280  0.7573816
##   49  0.8363842  0.7546206
##   50  0.8297644  0.7446846
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.
k = fit$results$k[which.max(fit$results$Accuracy)]
library(class) #for KNN

class.knn = knn(
  train = utrain[,-7], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = utrain$Class, # vector of class labels for train_dataing data
  k = k)
error_knn <- tibble(test[c("Name", "classification")], class.knn) %>% 
  filter(classification != class.knn)
error_knn
## # A tibble: 11 x 3
##    Name                      classification class.knn
##    <chr>                     <fct>          <fct>    
##  1 Regigigas                 Legendary      Mega     
##  2 Volcanion                 Legendary      Mega     
##  3 DeoxysDefense Forme       Legendary      Normal   
##  4 Virizion                  Legendary      Normal   
##  5 BeedrillMega Beedrill     Mega           Normal   
##  6 CharizardMega Charizard X Mega           Normal   
##  7 SceptileMega Sceptile     Mega           Legendary
##  8 SwampertMega Swampert     Mega           Normal   
##  9 LatiasMega Latias         Mega           Legendary
## 10 Cresselia                 Normal         Legendary
## 11 Manaphy                   Normal         Legendary
class.knn.conf = table(true = test$classification, predicted = class.knn)
class.knn.conf
##            predicted
## true        Legendary Mega Normal
##   Legendary         8    2      2
##   Mega              2    5      3
##   Normal            2    0    136
nrow(error_knn)
## [1] 11
error_rate = (nrow(error_knn))/160
error_rate
## [1] 0.06875
detach("package:class", unload = TRUE)

Modelling (Random Undersampling)

dtrain <- downSample(testing_df[,-1], testing_df$classification)

table(dtrain$Class)
## 
## Legendary      Mega    Normal 
##        39        39        39
fit <- train(Class ~  .,
             method     = "knn",
             tuneGrid   = expand.grid(k = 1:50),
             trControl  = trControl,
             metric     = "Accuracy",
             data       = utrain
)
fit
## k-Nearest Neighbors 
## 
## 1662 samples
##    6 predictor
##    3 classes: 'Legendary', 'Mega', 'Normal' 
## 
## No pre-processing
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 1579, 1578, 1579, 1579, 1580, 1579, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    1  0.9897727  0.9846576
##    2  0.9807792  0.9711652
##    3  0.9729330  0.9593965
##    4  0.9681060  0.9521561
##    5  0.9651155  0.9476669
##    6  0.9566737  0.9350120
##    7  0.9476294  0.9214462
##    8  0.9356237  0.9034329
##    9  0.9158003  0.8737045
##   10  0.9031849  0.8547733
##   11  0.8682283  0.8023532
##   12  0.8550180  0.7825278
##   13  0.8514250  0.7771370
##   14  0.8465834  0.7698828
##   15  0.8423732  0.7635665
##   16  0.8351874  0.7527935
##   17  0.8261216  0.7392074
##   18  0.8201185  0.7302025
##   19  0.8195017  0.7292707
##   20  0.8200822  0.7301348
##   21  0.8200966  0.7301675
##   22  0.8225284  0.7337864
##   23  0.8279431  0.7418851
##   24  0.8321957  0.7482630
##   25  0.8267590  0.7401178
##   26  0.8214242  0.7321073
##   27  0.8244511  0.7366539
##   28  0.8231952  0.7347709
##   29  0.8250024  0.7374713
##   30  0.8334661  0.7501532
##   31  0.8346277  0.7519234
##   32  0.8394911  0.7592123
##   33  0.8292134  0.7437890
##   34  0.8244084  0.7365846
##   35  0.8214029  0.7320755
##   36  0.8262370  0.7393197
##   37  0.8286909  0.7430208
##   38  0.8340692  0.7511007
##   39  0.8371106  0.7556658
##   40  0.8334594  0.7501784
##   41  0.8352812  0.7529101
##   42  0.8370959  0.7556201
##   43  0.8382647  0.7573679
##   44  0.8376768  0.7564776
##   45  0.8370741  0.7555546
##   46  0.8413130  0.7619081
##   47  0.8400352  0.7600077
##   48  0.8382278  0.7572916
##   49  0.8322105  0.7482768
##   50  0.8267742  0.7401266
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.
k = fit$results$k[which.max(fit$results$Accuracy)]
library(class) #for KNN
library(caret) 

class.knn = knn(
  train = dtrain[,-7], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = dtrain$Class, # vector of class labels for train_dataing data
  k = k)
error_knn <- tibble(test[c("Name", "classification")], class.knn) %>% 
  filter(classification != class.knn)
error_knn
## # A tibble: 24 x 3
##    Name                  classification class.knn
##    <chr>                 <fct>          <fct>    
##  1 Regigigas             Legendary      Mega     
##  2 Volcanion             Legendary      Mega     
##  3 BeedrillMega Beedrill Mega           Normal   
##  4 SceptileMega Sceptile Mega           Legendary
##  5 LatiasMega Latias     Mega           Legendary
##  6 Lapras                Normal         Legendary
##  7 Cresselia             Normal         Legendary
##  8 GourgeistSuper Size   Normal         Mega     
##  9 Crustle               Normal         Mega     
## 10 Politoed              Normal         Mega     
## # … with 14 more rows
class.knn.conf = table(true = test$classification, predicted = class.knn)
class.knn.conf
##            predicted
## true        Legendary Mega Normal
##   Legendary        10    2      0
##   Mega              2    7      1
##   Normal           10    9    119
nrow(error_knn)
## [1] 24
error_rate = (nrow(error_knn))/160
error_rate
## [1] 0.15
detach("package:class", unload = TRUE)

Acknoledgements

Thank you to Alberto Barradas for the dataset. Thank you Xavier for the inspiration for some of my EDA graphs. PokePalettes was a huge help in determining many of the HTML color codes to generate graphs. Bulbapedia for all the .png files seen in my graphs and being an essential “Pokemon encyclopedia” to the entire community. Pokemon, Game Freak, Nintendo for all the great memories and continuing to put out games so generation after generation can continue experience to experience the same euphoria.

Appendix

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
rm(list = ls())
library(tidyverse)
library(ggrepel)
library(png)
library(grid)
setwd("~/Desktop/code/R/Pokemon")
pokemon <- read_csv("Pokemon.csv")
glimpse(pokemon)
pokemon <- pokemon %>% 
  rename(SpAtk = `Sp. Atk`, SpDef = `Sp. Def`, Type1 = `Type 1`, Type2 = `Type 2`, PokedexNum = `#`)

pokemon <- pokemon %>% 
  mutate(AtkTotal = Attack + SpAtk,
         DefTotal = Defense + SpDef,
         isMega = grepl("Mega", Name, ignore.case = FALSE),
         isMultiType = !is.na(Type2),
         classification = if_else(isMega == TRUE, "Mega", 
                                  if_else(Legendary == TRUE, "Legendary", "Normal"))
         )

factor_cols = c("Generation", "classification")
int_cols = c("PokedexNum", "Total", "HP", "Attack", "Defense", "SpAtk", "SpDef", "Speed")

pokemon[factor_cols] <- lapply(pokemon[factor_cols], factor)
pokemon[int_cols] <- lapply(pokemon[int_cols], as.integer)

glimpse(pokemon)
totals <- pokemon %>% 
  group_by(Type1) %>% 
  summarise(count = n())
  
# Generation 1 Color Scheme
pokemon %>% 
  ggplot(aes(x = fct_infreq(Type1))) +
  geom_bar(fill = "#84ADD7", color = "#F2684A") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  labs(title = "Frequency of Primary Types", x = "Type", y = "Frequency") + 
  geom_text(aes(Type1, count + 5, label = count, fill = NULL), data = totals)
totals <- pokemon %>% 
  filter(!is.na(Type2)) %>% 
  group_by(Type2) %>% 
  summarise(count = n())

# Genration 2 Color Scheme
pokemon %>% 
  filter(!is.na(Type2)) %>% 
  ggplot(aes(x = fct_infreq(Type2))) +
  geom_bar(fill = "#C8CFD7", color = "#feff6a") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  labs(title = "Frequency of Secondary Types", x = "Type", y = "Frequency") +
  geom_text(aes(Type2, count + 5, label = count, fill = NULL), data = totals)
type_combinations <- pokemon %>%
  mutate(Type2 = ifelse(is.na(Type2), "", Type2)) %>% 
  group_by(Type1, Type2) %>%
  summarise(count=n())

#Pikachu Color Scheme
type_combinations %>% 
  ggplot(aes(x=Type1,y=as.character(Type2))) + 
  geom_tile(aes(fill = count), show.legend = FALSE) +
  geom_text(aes(label=count)) +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  labs(x="Type 1", y="Type 2",
       title="Type Combinations") +   
  scale_fill_gradient(low="#f6bd20", high="#c52018") 
# Bug, Dark, Dragon, Electric, Fairy, Fighting, Fire, Flying, Ghost, Grass, Ground, Ice, Normal, Poison, Psychic, Rock, Steel, Water
type_colors = c("#A8B820", "#705848", "#7038F8", "#F8D030", "#EE99AC", "#C03028","#F08030","#A890F0",
                "#705898", "#78C850", "#E0C068", "#98D8D8","#A8A878", "#A040A0", "#F85888", "#B8A038",
                "#B8B8D0", "#6890F0")

type_colors_outline = c("#C6D16E", "#49392F", "#4924A1", "#A1871F", "#9B6470", "#7D1F1A", "#9C531F",
                        "#6D5E9C", "#493963", "#4E8234", "#927D44", "#638D8D", "#6D6D4E", "#682A68",
                        "#A13959", "#786824", "#787887", "#445E9C")

pokemon %>% 
  ggplot(aes(x = Type1, y = Total, fill = Type1, color = Type1)) +
  geom_boxplot(show.legend = FALSE) +
  labs(title = "Stats by Primary Type") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none")
pokemon %>% 
  filter(Legendary == TRUE) %>% 
  ggplot(aes(x=Type1, fill = Type1, color = Type1)) +
  geom_bar(show.legend = FALSE) + 
  scale_fill_manual(values = type_colors[-c(1,6, 14)],
                    guide = "none") +
  scale_color_manual(values = type_colors_outline[-c(1,6, 14)],
                    guide = "none") +
  labs(title = "Primary Type of Legendary Pokemon")
# Latios/Latias Color Scheme
pokemon %>% 
  ggplot(aes(fill = Legendary, x=Type1)) +
  geom_bar(position="stack") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) + 
  scale_fill_manual(values = c("#cd696e", "#7db5da")) +
  labs(title = "Legendary Pokemon by Primary Type", x = "Primary Type", y = "Frequency")
# Xerneas/Yveltal Color Scheme
pokemon %>% 
  ggplot(aes(fill = isMega, x=Type1)) +
  geom_bar(position="stack") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) + 
  scale_fill_manual(values = c("#e9351c", "#6275b9")) +
  labs(title = "Mega Pokemon by Primary Type", x = "Primary Type", y = "Frequency")

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

pokemon %>% 
  ggplot(aes(x = Type1, y = HP, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "HP by Primary Type")

pokemon %>% 
  filter(is_outlier(HP) == TRUE) %>% 
  mutate(HPPercent = round(HP / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, HP, HPPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = Attack, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "ATk by Primary Type")

pokemon %>% 
  filter(is_outlier(Attack) == TRUE) %>% 
  mutate(AtkPercent = round(Attack / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Attack, AtkPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = Defense, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "DEF by Primary Type")

pokemon %>% 
  filter(is_outlier(Defense) == TRUE) %>% 
  mutate(DefPercent = round(Defense / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Defense, DefPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = SpAtk, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "SpAtk by Primary Type")

pokemon %>% 
  filter(is_outlier(SpAtk) == TRUE) %>% 
  mutate(SpAtkPercent = round(SpAtk / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, SpAtk, SpAtkPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = SpDef, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "SpDef by Primary Type")

pokemon %>% 
  filter(is_outlier(SpDef) == TRUE) %>% 
  mutate(SpDefPercent = round(SpDef / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, SpDef, SpDefPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = Speed, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none")  + 
  labs(title = "Speed by Primary Type")

pokemon %>% 
  filter(is_outlier(Speed) == TRUE) %>% 
  mutate(SpeedPercent = round(Speed / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Speed, SpeedPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = AtkTotal, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none") + 
  labs(title = "Total Atk by Primary Type")

pokemon %>% 
  filter(is_outlier(AtkTotal) == TRUE) %>% 
  mutate(AtkPercent = round(AtkTotal / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, AtkTotal, AtkPercent)
pokemon %>% 
  ggplot(aes(x = Type1, y = DefTotal, fill = Type1, color = Type1)) +
  geom_boxplot() +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  scale_fill_manual(values = type_colors,
                    guide = "none") +
  scale_color_manual(values = type_colors_outline,
                    guide = "none")  + 
  labs(title = "Total Def by Primary Type")

pokemon %>% 
  filter(is_outlier(DefTotal) == TRUE) %>% 
  mutate(DefPercent = round(DefTotal / Total, 2)) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, DefTotal, DefPercent)
# Legendary Birds Color Scheme
pokemon %>% 
  ggplot(aes(x = classification, y = Total, color = classification, fill = classification)) + 
  geom_boxplot(show.legend = FALSE) + 
  scale_fill_manual(values = c("#d50808", "#ffd541", "#94c5ff"),
                    guide = "none") +
  scale_color_manual(values = c("#ffc54a", "#9c7b10", "#005273"),
                    guide = "none") + 
  labs(title = "Total Stats by Classification")
pokemon %>% 
 ggplot(aes(x=Total)) +
   geom_density(alpha=0.5, aes(fill=Type1)) +
   facet_wrap(~Type1) + 
   labs(x="Total", y="Density") +
  scale_fill_manual(values = type_colors,
                    guide = "none")
# Generation Mascot Color Scheme
pokemon %>% 
  ggplot(aes(x = Generation, y = Total, color = Generation, fill = Generation)) +
  geom_boxplot() + 
  scale_fill_manual(values = c("#2062ac", "#deac00", "#ff2029", "#205a94", "#181820", "#6275b9"),
                    guide = "none") +
  scale_color_manual(values = c("#F2684A", "#9cace6", "#313973", "#bd6ad5", "#bdbdd5", "#e9351c"),
                    guide = "none") + 
  labs(title = "Total Stats by Generation")
pokemon %>% 
  mutate(MaxAtk = ifelse(Attack > SpAtk, Attack, SpAtk)) %>% 
  filter(MaxAtk > 100) %>% 
  ggplot(aes(x = Speed, y = MaxAtk)) +
  geom_point(aes(color = Type1)) +
  geom_smooth(method = 'lm') +
  scale_color_manual(values = type_colors)  + 
  labs(title = "Offensive Potential (Speed vs. MaxAtk)")

pokemon %>% 
  mutate(MaxAtk = ifelse(Attack > SpAtk, Attack, SpAtk)) %>% 
  filter(MaxAtk >= 160 & Speed > 120) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Attack, SpAtk, MaxAtk,Speed)

pokemon %>% 
  mutate(MaxAtk = ifelse(Attack > SpAtk, Attack, SpAtk)) %>% 
  filter(MaxAtk >= 100 & Speed < 40) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, Attack, SpAtk, MaxAtk,Speed)
pokemon %>% 
  mutate(MaxDef = ifelse(Defense > SpDef, Defense, SpDef)) %>% 
  ggplot(aes(x = HP, y = MaxDef)) +
  geom_point(aes(color = Type1)) +
  geom_smooth(method = 'lm') +
  scale_color_manual(values = type_colors)  + 
  labs(title = "Wall Potential (HP vs. MaxDef)")

pokemon %>% 
  mutate(MaxDef = ifelse(Defense > SpDef, Defense, SpDef)) %>% 
  filter(HP >= 150) %>% 
  arrange(-HP) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, HP, Defense, SpDef, MaxDef)

pokemon %>% 
  mutate(MaxDef = ifelse(Defense > SpDef, Defense, SpDef)) %>% 
  filter(MaxDef >= 150) %>% 
  arrange(-MaxDef) %>% 
  select(PokedexNum, Name, Type1, Type2, Total, HP, Defense, SpDef, MaxDef)
# Scary Max
pokemon %>% 
  summarise(n = max(PokedexNum + 1),
            name = "OP",
            Total = max(HP)+max(Attack)+max(SpAtk)+max(Defense)+max(SpDef) + max(Speed),
            HP = max(HP),
            Attack = max(Attack),
            SpAtk = max(SpAtk),
            Defense = max(Defense),
            SpDef = max(SpDef),
            Speed = max(Speed))

pokemon %>% 
  filter(Total == max(Total))

pokemon %>% 
  summarise(n = min(PokedexNum - 1),
            name = "OP",
            Total = min(HP)+min(Attack)+min(SpAtk)+min(Defense)+min(SpDef) + min(Speed),
            HP = min(HP),
            Attack = min(Attack),
            SpAtk = min(SpAtk),
            Defense = min(Defense),
            SpDef = min(SpDef),
            Speed = min(Speed))

pokemon %>% 
  filter(Total == min(Total))

pokemon %>% 
  summarise(n = as.integer(mean(PokedexNum - 1)),
            name = "OP",
            Total = as.integer(mean(HP))+as.integer(mean(Attack))+as.integer(mean(SpAtk))+as.integer(mean(Defense))+as.integer(mean(SpDef)) + as.integer(mean(Speed)),
            HP = as.integer(mean(HP)),
            Attack = as.integer(mean(Attack)),
            SpAtk = as.integer(mean(SpAtk)),
            Defense = as.integer(mean(Defense)),
            SpDef = as.integer(mean(SpDef)),
            Speed = as.integer(mean(Speed)))

pokemon %>% 
  filter(Total == 432)

pokemon %>% 
  summarise(n = as.integer(median(PokedexNum - 1)),
            name = "OP",
            Total = as.integer(median(HP))+as.integer(median(Attack))+as.integer(median(SpAtk))+as.integer(median(Defense))+as.integer(median(SpDef)) + as.integer(median(Speed)),
            HP = as.integer(median(HP)),
            Attack = as.integer(median(Attack)),
            SpAtk = as.integer(median(SpAtk)),
            Defense = as.integer(median(Defense)),
            SpDef = as.integer(median(SpDef)),
            Speed = as.integer(median(Speed)))

pokemon %>% 
  filter(Total == 410)
# Chansey Color Scheme
pokemon %>% 
ggplot(aes(x=HP)) +
  geom_histogram(binwidth=4, fill="#ffacac", colour="#ff835a") + 
  labs(x="HP", y="Frequency") 

# Landorus Color Scheme
pokemon %>% 
ggplot(aes(x=Attack)) +
  geom_histogram(binwidth=4, fill="#f67b41", colour="#83624a") + 
  labs(x="Attack", y="Frequency") 

# Greninja
pokemon %>% 
ggplot(aes(x=SpAtk)) +
  geom_histogram(binwidth=4, fill="#354698", colour="#e7788d") + 
  labs(x="SpAtk", y="Frequency") 

# Steelix
pokemon %>% 
ggplot(aes(x=Defense)) +
  geom_histogram(binwidth=4, fill="#7b94a4", colour="#dee6de") + 
  labs(x="Defense", y="Frequency") 

# Shuckle
pokemon %>% 
ggplot(aes(x=SpDef)) +
  geom_histogram(binwidth=4, fill="#b43129", colour="#ffff5a") + 
  labs(x="SpDef", y="Frequency") 

#Deoxys
pokemon %>% 
ggplot(aes(x=Speed)) +
  geom_histogram(binwidth=4, fill="#5294ac", colour="#ff734a") + 
  labs(x="Speed", y="Frequency") 

#Mewtwo
pokemon %>% 
ggplot(aes(x=Total)) +
  geom_histogram(binwidth=10, fill="#6a319c", colour="#b4acc5") + 
  labs(x="Total", y="Frequency") 
pokemon %>% 
  ggplot(aes(x=HP, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="HP", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Attack, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Attack", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=SpAtk, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="SpAtk", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Defense, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Defense", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=SpDef, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="SpDef", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Speed, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Speed", y="Density", title = "Legendary Comparison")

pokemon %>% 
  ggplot(aes(x=Total, fill=Legendary)) +
  geom_density(alpha=0.5) +
  labs(x="Total", y="Density", title = "Legendary Comparison")

pokemon %>% 
  group_by(Generation) %>% 
  summarise(avg = as.integer(mean(Total))) %>% 
  ggplot(aes(x=Generation, y = avg, group = 1)) +
  geom_line() +
  geom_point(color = "red") +
  labs(title = "Average Total for each Generation")
pokemon %>%
  group_by(Generation) %>%
  summarize(HP=mean(HP),
            Attack=mean(Attack),
            Defense=mean(Defense),
            Sp..Atk=mean(SpAtk),
            Sp..Def=mean(SpDef),
            Speed=mean(Speed)) %>%
  gather(Stats, value, 2:7) %>%
  ggplot(aes(x=Generation, y=value, group=1)) +
    geom_line() +
    geom_point(color = "red") +
    facet_wrap(~Stats) +
    labs(y="Average Stats")
#https://drmowinckels.io/blog/adding-external-images-to-plots/
#download.file("https://cdn.bulbagarden.net/upload/5/56/242Blissey.png", "blissey.png")
# download.file("http://cdn.bulbagarden.net/upload/f/f8/242MS.png", "blissey-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/e/ea/113MS.png", "chansey-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/f/fa/202MS.png", "wobb-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/e/ec/321MS.png", "wailord-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/5/5a/594MS.png", "alo-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/e/e0/143XYMS.png", "snorlax-h-sprite.png")
# download.file("cdn.bulbagarden.net/upload/0/0d/289MS.png", "slaking-h-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/e8/487MS.png", "g-alt-h-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/2f/487OMS.png", "g-o-h-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/4/45/426MS.png", "drif-h-sprite.png")
hp_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",
                                       pattern = "-h-"), 
                     pokemon %>% 
  top_n(10, HP) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(HP)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
hp_sprites
img = readPNG("images/blissey.png")
g =  rasterGrob(img, interpolate=TRUE)
g_sprite = list()
hp_plot <- pokemon %>%
  select(Name, HP) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, HP), y=HP)) +
  geom_bar(aes(fill=HP), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=HP)) +
  scale_fill_gradient(low="#ff835a", high="#ffacac") + 
  coord_flip() +
  labs(x="Name", title="Top 10 HP Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=7, ymin=160, ymax=260)
for(i in 1:nrow(hp_sprites)){
  img = readPNG(paste0("images/",hp_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  hp_plot = hp_plot +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-15, ymax=2.5)
}

hp_plot
# download.file("https://archives.bulbagarden.net/media/upload/6/67/150MXMS.png", "mewtwo-a-x.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/72/214MMS.png", "hera-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ad/384MMS.png", "ray-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/98/383PMS.png", "groudon-a-p.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/07/386AMS.png", "deoxys-a-aform.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c0/646BMS.png", "kyurem-a-b.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/7f/445MMS.png", "garch-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/91/409MS.png", "ramp-a-sprite.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0a/475MMS.png", "gallade-a-m.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/5e/354MMS.png", "banette-a-m.png")
atk_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",
                                        pattern = "-a-"), pokemon %>% 
  top_n(10, Attack) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Attack)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
atk_sprites
#download.file("https://cdn.bulbagarden.net/upload/7/7f/150Mewtwo-Mega_X.png", "mega-X.png")
img = readPNG("images/mega-X.png")
g =  rasterGrob(img, interpolate=TRUE)
g_sprite = list()
atk_graph <- pokemon %>%
  select(Name, Attack) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Attack), y=Attack)) +
  geom_bar(aes(fill=Attack), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Attack)) +
  scale_fill_gradient(low="#b4acc5", high="#6a319c") + 
  coord_flip() +
  labs(x="Name", title="Top 10 Attack Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=155, ymax=210)

for(i in 1:nrow(atk_sprites)){
  img = readPNG(paste0("images/",atk_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  atk_graph = atk_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

atk_graph
# download.file("https://archives.bulbagarden.net/media/upload/2/29/181MMS.png", "ampharos-spa-,.png")
# download.file("https://archives.bulbagarden.net/media/upload/3/34/282MMS.png", "gardevoir-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/f4/094MMS.png", "gengar-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/64/720UMS.png", "hoopa-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/74/646WMS.png", "kyurem-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0c/065MMS.png", "alakazam-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/07/386AMS.png", "deoxys-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/02/382PMS.png", "kyogre-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ad/384MMS.png", "ray-spa-.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/ff/150MYMS.png", "mewtwo-spa-.png")
spatk_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",pattern = "-spa-"), pokemon %>% 
  top_n(10, SpAtk) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(SpAtk)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
spatk_sprites
#download.file("https://cdn.bulbagarden.net/upload/5/5f/150Mewtwo-Mega_Y.png", "mega-Y.png")
img = readPNG("images/mega-Y.png")
g =  rasterGrob(img, interpolate=TRUE)
spatk_graph <- pokemon %>%
  select(Name, SpAtk) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, SpAtk), y=SpAtk)) +
  geom_bar(aes(fill=SpAtk), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=SpAtk)) +
  scale_fill_gradient(low="#b4acc5", high="#6a319c") + 
  coord_flip() +
  labs(x="Name", title="Top 10 SpAtk Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=4, ymin=160, ymax=210)

for(i in 1:nrow(spatk_sprites)){
  img = readPNG(paste0("images/",spatk_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  spatk_graph = spatk_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

spatk_graph
# download.file("https://archives.bulbagarden.net/media/upload/5/5b/306MMS.png", "aggron-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/d/d5/306MS.png", "aggron-d-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/53/713MS.png", "avalugg-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/ef/411MS.png", "bastiodon-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ac/091XYMS.png", "cloyster-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/6c/377MS.png", "regirock-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0a/213MS.png", "shuckle-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/65/080MMS.png", "slobrow-d.png")
# download.file("https://archives.bulbagarden.net/media/upload/b/bf/208MS.png", "steelix-d-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/05/208MMS.png", "steelix-d.png")
def_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/",
                                        pattern = "-d"), pokemon %>% 
  top_n(10, Defense) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Defense)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
def_sprites
#download.file("https://cdn.bulbagarden.net/upload/1/1b/208Steelix-Mega.png", "mega-steel.png")
img = readPNG("images/mega-steel.png")
g =  rasterGrob(img, interpolate=TRUE)
def_graph <- pokemon %>%
  select(Name, Defense) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Defense), y=Defense)) +
  geom_bar(aes(fill=Defense), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Defense)) +
  scale_fill_gradient(low="#dee6de", high="#7b94a4") + 
  coord_flip() +
  labs(x="Name", title="Top 10 Defense Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=180, ymax=245)

for(i in 1:nrow(def_sprites)){
  img = readPNG(paste0("images/",def_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  def_graph = def_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

def_graph
# download.file("https://archives.bulbagarden.net/media/upload/3/37/681MS.png", "aegislash-spd,.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/5b/703MS.png", "carbink-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/20/386DMS.png", "deoxys-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/53/719MS.png", "diancie-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/ea/671MS.png", "florges-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/3/3c/706MS.png", "goodra-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/e/ee/250MS.png", "Hooh-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/02/382PMS.png", "kyogre-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/24/380MMS.png", "latias-spd-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c8/249MS.png", "lugia-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/aa/476MS.png", "probopass-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/99/378MS.png", "regice-spd.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/2e/379MS.png", "registeel-spd-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0a/213MS.png", "shuckle-spd.png")
spdef_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/", pattern = "-spd"), pokemon %>% 
  top_n(10, SpDef) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(SpDef)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
spdef_sprites
#download.file("https://cdn.bulbagarden.net/upload/c/c7/213Shuckle.png", "shuckle.png")
img = readPNG("images/shuckle.png")
g =  rasterGrob(img, interpolate=TRUE)
spdef_graph <- pokemon %>%
  select(Name, SpDef) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, SpDef), y=SpDef)) +
  geom_bar(aes(fill=SpDef), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=SpDef)) +
  scale_fill_gradient(low="#ffff5a", high="#b43129") + 
  coord_flip() +
  labs(x="Name", title="Top 10 SpDef Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=180, ymax=230)

for(i in 1:nrow(spdef_sprites)){
  img = readPNG(paste0("images/", spdef_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  spdef_graph = spdef_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

spdef_graph
# download.file("https://archives.bulbagarden.net/media/upload/0/07/386AMS.png", "deoxys-spe-a.png")
# download.file("https://archives.bulbagarden.net/media/upload/8/86/386MS.png", "deoxys-spe-n.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/fa/386SMS.png", "deoxys-spe-s.png")
# download.file("https://archives.bulbagarden.net/media/upload/3/33/617MS.png", "accelgor-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c4/142MMS.png", "aerodac-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/0c/065MMS.png", "alakazam-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c8/015MMS.png", "buzz-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/69/101MS.png", "electrocude-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/ff/150MYMS.png", "mewtwo-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/54/291MS.png", "ninjask-spe.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/27/254MMS.png", "sceptile-spe.png")
speed_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/", pattern = "-spe"), pokemon %>% 
  top_n(10, Speed) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Speed)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
speed_sprites
#download.file("https://cdn.bulbagarden.net/upload/2/2b/386Deoxys-Speed.png", "speed.png")
img = readPNG("images/speed.png")
g =  rasterGrob(img, interpolate=TRUE)
speed_graph <- pokemon %>%
  select(Name, Speed) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Speed), y=Speed)) +
  geom_bar(aes(fill=Speed), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Speed)) +
  scale_fill_gradient(low="#5294ac", high="#ff734a") + 
  coord_flip() +
  labs(x="Name", title="Top 10 Speed Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=150, ymax=200)

for(i in 1:nrow(speed_sprites)){
  img = readPNG(paste0("images/",speed_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  speed_graph = speed_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-10, ymax=2.5)
}

speed_graph
# download.file("https://archives.bulbagarden.net/media/upload/0/0e/493OD_DP.png", "arceus-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ae/719MMS.png", "diancie-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/7f/445MMS.png", "garchomp-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/9/98/383PMS.png", "groudon-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/0/02/382PMS.png", "kyogre-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/c/c0/646BMS.png", "kyurem-tol-b.png")
# download.file("https://archives.bulbagarden.net/media/upload/7/74/646WMS.png", "kyurem-tol-w.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/24/380MMS.png", "latias-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/2/2d/381MMS.png", "latios-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/54/376MMS.png", "metagross-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/6/67/150MXMS.png", "mewtwo-tol-x.png")
# download.file("https://archives.bulbagarden.net/media/upload/f/ff/150MYMS.png", "mewtwo-tol-y.png")
# download.file("https://archives.bulbagarden.net/media/upload/a/ad/384MMS.png", "ray-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/4/4a/373MMS.png", "salamence-tol.png")
# download.file("https://archives.bulbagarden.net/media/upload/5/59/248MMS.png", "ttar-tol.png")
total_sprites <- tibble(images=list.files(path = "~/Desktop/code/R/Pokemon/images/", pattern = "-tol"), pokemon %>% 
  top_n(10, Total) %>% 
  arrange(Name) %>% 
  mutate(rank = row_number(Total)) %>% 
  select(Name, rank)) %>% 
  arrange(rank)
total_sprites
#download.file("https://cdn.bulbagarden.net/upload/5/58/384Rayquaza-Mega.png", "mega-raq.png")
img = readPNG("images/mega-raq.png")
g =  rasterGrob(img, interpolate=TRUE)
total_graph <- pokemon %>%
  select(Name, Total) %>%
  top_n(10) %>% 
  ggplot(aes(x=reorder(Name, Total), y=Total)) +
  geom_bar(aes(fill=Total), stat="identity", colour="black", show.legend=FALSE) +
  geom_label(aes(label=Total)) +
  coord_flip() +
  scale_fill_gradient(low="#f6de00", high="#5abd8b") + 
  labs(x="Name", title="Top 10 Total Pokémon") +
  annotation_custom(grob=g, xmin=0, xmax=5, ymin=600, ymax=900)

for(i in 1:nrow(total_sprites)){
  img = readPNG(paste0("images/",total_sprites$images[i]))
  g_sprite[[i]] =  rasterGrob(img, interpolate=TRUE)
  
  total_graph = total_graph +
    annotation_custom(grob=g_sprite[[i]], xmin=i-5, xmax=i+5, ymin=-40, ymax=10)
}

total_graph
pokemon %>% 
  ggplot(aes(x = PokedexNum, y = Total, color = classification)) +
  geom_point()
pokemon %>%
  count(Generation) %>%
  ggplot(aes(x=Generation, y=n, fill = Generation, color = Generation)) + 
  geom_bar(stat="identity") +
  geom_label(aes(label=n)) +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon per generation") +
  scale_fill_manual(values = c("#2062ac", "#deac00", "#ff2029", "#cdb4d5", "#181820", "#6275b9"),
                    guide = "none") +
  scale_color_manual(values = c("white", "white", "white", "white", "white", "white"),
                    guide = "none")
ggplot(pokemon, aes(x=Type1, fill=Generation)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") +
  scale_fill_manual(values = c("#2062ac", "#deac00", "#ff2029", "#cdb4d5", "#181820", "#6275b9"),
                    guide = "none")
ggplot(pokemon, aes(x=Generation, fill=Type1)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") +
  scale_fill_manual(values = type_colors)
ggplot(pokemon, aes(x=Generation, fill=isMultiType)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") 
ggplot(pokemon, aes(x=Generation, fill=isMega)) + 
  geom_bar() +
  labs(x="Generation", y="Number of Pokémon",
       title="Number of Pokémon of each primary type per generation") 
library(class) #for KNN
library(caret) #for cross validation of KNN method
test <- pokemon %>% group_by(classification) %>% sample_frac(.2)
train_data <- setdiff(pokemon, test)

dim(test)
dim(train_data)

glimpse(test)
glimpse(train_data)
class.knn.20 = knn(
  train = train_data[6:11], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = train_data$classification, # vector of class labels for train_dataing data
  k = 20)
class.knn.20
tibble(test[c("Name", "classification")], class.knn.20)
class.knn.conf.20 = table(true = test$classification, predicted = class.knn.20)
class.knn.conf.20

class.knn.50 = knn(
  train = train_data[6:11], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = train_data$classification, # vector of class labels for train_dataing data
  k = 50)
class.knn.50
tibble(test[c("Name", "classification")], class.knn.50)
class.knn.conf.50 = table(true = test$classification, predicted = class.knn.50)
class.knn.conf.50
trControl <- trainControl(method  = "cv",
                          number  = 20)

fit <- train(classification ~  HP + Attack + Defense + SpAtk + SpDef + Speed,
             method     = "knn",
             tuneGrid   = expand.grid(k = 1:50),
             trControl  = trControl,
             metric     = "Accuracy",
             data       = train_data
)
fit

k = fit$results$k[which.max(fit$results$Accuracy)]
class.knn = knn(
  train = train_data[6:11], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = train_data$classification, # vector of class labels for train_dataing data
  k = k)
class.knn
error_knn <- tibble(test[c("Name", "classification")], class.knn) %>% 
  filter(classification != class.knn)
error_knn
class.knn.conf = table(true = test$classification, predicted = class.knn)
class.knn.conf
nrow(error_knn)
error_rate = (nrow(error_knn))/160
error_rate
detach("package:class", unload = TRUE)
testing_df <- train_data %>% 
  select(classification, HP, Attack, Defense, SpAtk, SpDef, Speed)

utrain <- upSample(testing_df[,-1], testing_df$classification)

table(utrain$Class)
fit <- train(Class ~  .,
             method     = "knn",
             tuneGrid   = expand.grid(k = 1:50),
             trControl  = trControl,
             metric     = "Accuracy",
             data       = utrain
)
fit

k = fit$results$k[which.max(fit$results$Accuracy)]
library(class) #for KNN

class.knn = knn(
  train = utrain[,-7], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = utrain$Class, # vector of class labels for train_dataing data
  k = k)
error_knn <- tibble(test[c("Name", "classification")], class.knn) %>% 
  filter(classification != class.knn)
error_knn
class.knn.conf = table(true = test$classification, predicted = class.knn)
class.knn.conf
nrow(error_knn)
error_rate = (nrow(error_knn))/160
error_rate

detach("package:class", unload = TRUE)
dtrain <- downSample(testing_df[,-1], testing_df$classification)

table(dtrain$Class)
fit <- train(Class ~  .,
             method     = "knn",
             tuneGrid   = expand.grid(k = 1:50),
             trControl  = trControl,
             metric     = "Accuracy",
             data       = utrain
)
fit

k = fit$results$k[which.max(fit$results$Accuracy)]
library(class) #for KNN
library(caret) 

class.knn = knn(
  train = dtrain[,-7], # train_dataing data for features used in classification
  test = test[6:11], # test data data for features used in classification
  cl = dtrain$Class, # vector of class labels for train_dataing data
  k = k)
error_knn <- tibble(test[c("Name", "classification")], class.knn) %>% 
  filter(classification != class.knn)
error_knn
class.knn.conf = table(true = test$classification, predicted = class.knn)
class.knn.conf
nrow(error_knn)
error_rate = (nrow(error_knn))/160
error_rate

detach("package:class", unload = TRUE)